@simon@simonwillison.net... - Random

scottjenson, 16 days ago

@simon
Simon, I'm working with @homeassistant a bit and we just had a fascinating discussion about 'nanoLLMs' that could run locally. They would NOT need the sum-total-of-all-human-knowledge but would really just be there as a smart parser for speech-to-text commands, keeping everything local. This is clearly still not trivial but hopefully one way to reduce the model size.

Do you know of any 'reduced' LLMs that could work in this more limited context?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Image

Image alternative text

aallan, 15 days ago

@scottjenson @simon @homeassistant It's an fascinating area that a lot of people are looking at right now. Moving the LLMs to edge hardware is certainly possible, I'm running LLaMa on my phone locally for instance. But you have to think about architectures. I've seen some interesting architectures built around key framing and feeding LLMs from TinyML models that look potentially pretty powerful.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 16 days ago

@scottjenson @homeassistant yes, I'm really interested in that kind of model. Phi-3 is one of the most interesting of those at the moment I think - only about a 2GB file so it should be usable on a Raspberry Pi

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

scottjenson, 16 days ago

@simon @homeassistant Excellent news, thank you. I'll get started running it locally on my Mac just to get started

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 16 days ago

@scottjenson @homeassistant I ran it with llamafile following the instructions in the official README and it worked great https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

scottjenson, 16 days ago

@simon @homeassistant Thanks again. I'll also give that a try. I'm currently running it with Ollama on a 5 year old desktop (it was shockingly easy) It's only using 30% of the CPU when I ask it a question!

Even then, I'd (naively?) suggest that it is far more power than I need. But the fact that I can get this far in just 5 minutes has me shaking my head in disbelief.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

simon, 16 days ago

@scottjenson @homeassistant Phi-3 is the first small model like that which has felt to me like it's capable of basic conversion tasks like summarization and RAG-extraction and extract-data-to-JSON, I was really impressed by it

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment