I pulled together notes on all of the LLM plugins that have worked for me for... - Random

simon, 1 month ago

I pulled together notes on all of the LLM plugins that have worked for me for Llama 3 - both for hosting locally (I've run 8B and 70B on my 64GB M2) and access via APIs (Groq is SO FAST for that)

Options for accessing Llama 3 from the terminal using LLM
https://simonwillison.net/2024/Apr/22/llama-3/

Animated demo. The sonnet appears in less than a second: Here is a sonnet about a brave owl: In moonlit skies, a silhouette is seen, A wingspan wide, a watchful, piercing gaze. The owl, a sentinel of secrets keen, Patrols the night, with valor in her ways. Her feathers soft, a camouflage gray, She glides unseen, a phantom of the night. Her eyes, like lanterns, shining bright and far, Illuminate the darkness, banishing all fright. Her talons sharp, a grasping, deadly sway, She swoops upon her prey, with silent might. Yet in her heart, a wisdom, old and gray, A fierce devotion to the darkness of the night. And thus, the owl, a symbol of courage true, Inspires us all, with brave and noble pursuit. I hope you enjoy this sonnet!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ aiefel, osma

Image

Image alternative text

simon, 1 month ago
Here's the super-short version. For a ~4GB local model that should run on most laptops:
brew install llm # or pipx install llm  
llm install llm-gpt4all  
llm -m Meta-Llama-3-8B-Instruct "Three great names for a pet emu"  
This will download the model file the first time you call it, but it will be cached and reused for subsequent prompts
reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 1 month ago
Or if you want to try Llama 3 70B at maximum speed via an API using https://groq.com/

Get an API key from https://console.groq.com/keys

Then run this:
llm install 'https:''//github.com/lexh/llm-groq/archive/ba9d7de74b3057b074a85fe99fe873b75519bd78.zip'  
llm keys set groq  
# &lt;paste API key here&gt;  
llm -m groq-llama3-70b 'a bad poem about a hungry pelican'  
reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 1 month ago

If you have 64GB of RAM you may be able to run Llama 3 70B directly on your own machine - I got it working using llamafile, details in the post here: https://simonwillison.net/2024/Apr/22/llama-3/#local-llama-3-70b-instruct-with-llamafile

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

nogweii, 1 month ago

@simon how much RAM does it grow to after interacting for a while? Is the 37GB pretty much the top end?

Also, how's the speed?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 1 month ago

@nogweii it seems to stay stable at 37GB - which is what I'd expect, LLMs are effectively stateless once you load the model weights

I'm getting 7.5 tokens/second from the 70B llamafile model

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

soeren, 1 month ago
@simon Thanks for sharing!. I realized that since llamafile is a wrapper around llama.cpp we can also use llama.cpp server directly with the llamafile plugin:
git clone <https://github.com/ggerganov/llama.cpp.git>  
make  
huggingface-cli download MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF --include '*Q5_K_M.gguf' --local-dir .  
./server -m Meta-Llama-3-70B-Instruct.Q5_K_M.gguf -c 8192 -n 2048  
llm -m llamafile "3 neat characteristics of a pelican"  
https://github.com/ggerganov/llama.cpp/tree/master/examples/server
reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 1 month ago

@soeren That's a great example, thanks! Yeah maybe I should have called the llm-llamafile plugin something else

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment