LChoshen,
@LChoshen@sigmoid.social avatar

Pretrain to predict the future
At each step the model predicts n-tokens
Performance: 😃
Inference time: ✖️3
Training time: same

MetaAI
Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, Gabriel Synnaeve

https://arxiv.org/abs/2404.19737

kellogh,
@kellogh@hachyderm.io avatar

@LChoshen what is “sample efficiency”? is that data efficiency? i.e. the amount of training data needed to get the same performance

also why is n=4 only 3 times faster? why not 4x?

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • GTA5RPClips
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • tacticalgear
  • cubers
  • Youngstown
  • mdbf
  • slotface
  • rosin
  • osvaldo12
  • ngwrru68w68
  • kavyap
  • InstantRegret
  • JUstTest
  • everett
  • Durango
  • cisconetworking
  • khanakhh
  • ethstaker
  • tester
  • anitta
  • Leos
  • normalnudes
  • modclub
  • megavids
  • provamag3
  • lostlight
  • All magazines