For inference, the best option right now is llama.cpp with quantized LLM in GGUF format. There are several high-lever wrappers around llama.cpp that makes it easy to use: ollama, vllama...
For inference with very big LLM and very small RAM, the only option is airLLM: it's slow, but you can run llama3-70b
For finetuning quantized LLM with LoRA, the only option afaik is also llama.cpp (look for "finetune"). It's a work in progress but usable and promising!
@Jigsaw_You It's definitely a relative notion... Transformers are for sure much better at generalizing than the pre-2017 NLP methods, and they're very likely worse than the current expectations, and future methods... 🙂
Create a Large Language Model from Scratch with Python Tutorial 👇🏼
Another fun tutorial from freeCodeCamp, focusing on building LLM model from scratch with Python. It covers topics such as:
✅ Handling and processing text
✅ Core PyTorch functions for text
✅ Basic language models
✅ Advance methods
✅ Working with GPUs