cerisara,
@cerisara@mastodon.online avatar

on CPU only.

For inference, the best option right now is llama.cpp with quantized LLM in GGUF format. There are several high-lever wrappers around llama.cpp that makes it easy to use: ollama, vllama...

For inference with very big LLM and very small RAM, the only option is airLLM: it's slow, but you can run llama3-70b

For finetuning quantized LLM with LoRA, the only option afaik is also llama.cpp (look for "finetune"). It's a work in progress but usable and promising!

  • All
  • Subscribed
  • Moderated
  • Favorites
  • llm
  • ngwrru68w68
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • InstantRegret
  • GTA5RPClips
  • Youngstown
  • everett
  • slotface
  • rosin
  • osvaldo12
  • mdbf
  • kavyap
  • cubers
  • megavids
  • modclub
  • normalnudes
  • tester
  • khanakhh
  • Durango
  • ethstaker
  • tacticalgear
  • Leos
  • provamag3
  • anitta
  • cisconetworking
  • JUstTest
  • lostlight
  • All magazines