Long Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length

TLDR We trained a series of 7B LLMs named XGen-7B with standard dense attention on up to 8K sequence length for up to 1.5T tokens. We also fine tune the models on public-domain instructional data. The main take-aways are: * On standard NLP benchmarks, XGen achieves comparable or better results

saplingtree,

What do they mean by "Wikipedia-21 other languages"? Maybe the larger training set makes it effective at grammar checking.

wagesj45,
wagesj45 avatar

They mention English Wikipedia with almost 20B tokens. Then 21 other languages at about 3B tokens per language. They just combined the other languages since they were all individually so much smaller than the English set.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • localllama@sh.itjust.works
  • mdbf
  • DreamBathrooms
  • InstantRegret
  • magazineikmin
  • Youngstown
  • everett
  • anitta
  • slotface
  • GTA5RPClips
  • rosin
  • thenastyranch
  • kavyap
  • tacticalgear
  • modclub
  • JUstTest
  • osvaldo12
  • Durango
  • khanakhh
  • provamag3
  • cisconetworking
  • ngwrru68w68
  • cubers
  • tester
  • ethstaker
  • megavids
  • normalnudes
  • Leos
  • lostlight
  • All magazines