Meta Releases Paper on SuperHOT Technique (8k Context Length via Positional Interpolation)

If you want to learn more about how 8k Context w/ SuperHOT was recently achieved (beyond the paper Meta shared), I highly recommend visiting kaiokendev's pages and posts below.

I was curious to hear more about SuperHOT myself, so I emailed kaiokendev and asked for learning material suggestions.

Here is what they shared with me. Thank you for this list, kaiokendev!

Recommendations from the Developer of SuperHOT (kaiokendev):

Here are some resources to help with learning LLMs:

Andrej Karpathy's GPT from scratch:

Huggingface's NLP Course:

And for training specifically:

Alpaca LoRA:


Community training guide:

Of course for papers, I recommend reading anything on arXiv's CS - Computation & Language that looks interesting to you:

