Long Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length

TLDR We trained a series of 7B LLMs named XGen-7B with standard dense attention on up to 8K sequence length for up to 1.5T tokens. We also fine tune the models on public-domain instructional data. The main take-aways are: * On standard NLP benchmarks, XGen achieves comparable or better results

Image

Image alternative text

saplingtree, 10 months ago

What do they mean by "Wikipedia-21 other languages"? Maybe the larger training set makes it effective at grammar checking.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

wagesj45, 10 months ago

They mention English Wikipedia with almost 20B tokens. Then 21 other languages at about 3B tokens per language. They just combined the other languages since they were all individually so much smaller than the English set.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment