nottheonion

This magazine is from a federated server and may be incomplete. Browse more on the original instance.

GPT-4o’s Chinese token-training data is polluted by spam and porn websites (www.technologyreview.com)

Of the 100 results, only three of them are common enough to be used in everyday conversations; everything else consisted of words and expressions used specifically in the contexts of either gambling or pornography. The longest token, lasting 10.5 Chinese characters, literally means “_free Japanese porn video to watch.” Oops....

  • All
  • Subscribed
  • Moderated
  • Favorites
  • nottheonion@lemmy.world
  • kavyap
  • thenastyranch
  • everett
  • DreamBathrooms
  • ethstaker
  • magazineikmin
  • cubers
  • Youngstown
  • tacticalgear
  • Durango
  • slotface
  • ngwrru68w68
  • rosin
  • osvaldo12
  • JUstTest
  • InstantRegret
  • cisconetworking
  • GTA5RPClips
  • modclub
  • tester
  • mdbf
  • khanakhh
  • normalnudes
  • Leos
  • megavids
  • anitta
  • provamag3
  • lostlight
  • All magazines