kjr,
@kjr@babka.social avatar

It is difficult to understand how Meta, a company who handles multilingual big data, uses almost only English data to train Llama 2. Only a 2% of non-English data and an 8.3% of language unknown or non language data (such as code).
Even for self-consume inside of the company it doesn't address their necessities.

Meta Warns Its Latest Large Language Model ‘May Not Be Suitable’ for Non-English Use

https://slator.com/meta-warns-large-language-model-may-not-be-suitable-non-english-use/

  • All
  • Subscribed
  • Moderated
  • Favorites
  • llm
  • DreamBathrooms
  • mdbf
  • ethstaker
  • magazineikmin
  • GTA5RPClips
  • rosin
  • thenastyranch
  • Youngstown
  • InstantRegret
  • slotface
  • osvaldo12
  • kavyap
  • khanakhh
  • Durango
  • megavids
  • everett
  • cisconetworking
  • normalnudes
  • tester
  • ngwrru68w68
  • cubers
  • modclub
  • tacticalgear
  • provamag3
  • Leos
  • anitta
  • JUstTest
  • lostlight
  • All magazines