Boosts - #LLM on CPU only....

cerisara, 18 days ago

#LLM on CPU only.

For inference, the best option right now is llama.cpp with quantized LLM in GGUF format. There are several high-lever wrappers around llama.cpp that makes it easy to use: ollama, vllama...

For inference with very big LLM and very small RAM, the only option is airLLM: it's slow, but you can run llama3-70b

For finetuning quantized LLM with LoRA, the only option afaik is also llama.cpp (look for "finetune"). It's a work in progress but usable and promising!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ doboprobodyne

doboprobodyne 18 days ago