simon,
@simon@simonwillison.net avatar
simon,
@simon@simonwillison.net avatar

Here's the super-short version. For a ~4GB local model that should run on most laptops:

brew install llm # or pipx install llm  
llm install llm-gpt4all  
llm -m Meta-Llama-3-8B-Instruct "Three great names for a pet emu"  

This will download the model file the first time you call it, but it will be cached and reused for subsequent prompts

simon,
@simon@simonwillison.net avatar

Or if you want to try Llama 3 70B at maximum speed via an API using https://groq.com/

Get an API key from https://console.groq.com/keys

Then run this:

llm install 'https:''//github.com/lexh/llm-groq/archive/ba9d7de74b3057b074a85fe99fe873b75519bd78.zip'  
llm keys set groq  
# <paste API key here>  
llm -m groq-llama3-70b 'a bad poem about a hungry pelican'  
simon,
@simon@simonwillison.net avatar

If you have 64GB of RAM you may be able to run Llama 3 70B directly on your own machine - I got it working using llamafile, details in the post here: https://simonwillison.net/2024/Apr/22/llama-3/#local-llama-3-70b-instruct-with-llamafile

nogweii,
@nogweii@nogweii.net avatar

@simon how much RAM does it grow to after interacting for a while? Is the 37GB pretty much the top end?

Also, how's the speed?

simon,
@simon@simonwillison.net avatar

@nogweii it seems to stay stable at 37GB - which is what I'd expect, LLMs are effectively stateless once you load the model weights

I'm getting 7.5 tokens/second from the 70B llamafile model

soeren,
@soeren@brunk.io avatar

@simon Thanks for sharing!. I realized that since llamafile is a wrapper around llama.cpp we can also use llama.cpp server directly with the llamafile plugin:

git clone <https://github.com/ggerganov/llama.cpp.git>  
make  
huggingface-cli download MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF --include '*Q5_K_M.gguf' --local-dir .  
./server -m Meta-Llama-3-70B-Instruct.Q5_K_M.gguf -c 8192 -n 2048  
llm -m llamafile "3 neat characteristics of a pelican"  

https://github.com/ggerganov/llama.cpp/tree/master/examples/server

simon,
@simon@simonwillison.net avatar

@soeren That's a great example, thanks! Yeah maybe I should have called the llm-llamafile plugin something else

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • kavyap
  • thenastyranch
  • ethstaker
  • DreamBathrooms
  • osvaldo12
  • magazineikmin
  • tacticalgear
  • Youngstown
  • everett
  • mdbf
  • slotface
  • ngwrru68w68
  • rosin
  • Durango
  • JUstTest
  • InstantRegret
  • GTA5RPClips
  • tester
  • cubers
  • cisconetworking
  • normalnudes
  • khanakhh
  • modclub
  • anitta
  • Leos
  • megavids
  • provamag3
  • lostlight
  • All magazines