LLM specialists: besides a powerful computer with a GPU, what kind of tricks can... - Random

seinecle, 4 months ago

LLM specialists: besides a powerful computer with a GPU, what kind of tricks can I use to make a locally hosted model spit out a response faster? Thx!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Image

Image alternative text

seinecle, 4 months ago

cc @TedUnderwood ?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

TedUnderwood, 4 months ago

@seinecle I don’t actually have first hand experience on this yet though I want to soon.

My understanding is that memory (VRAM or Apples unified memory) matters a lot. Also people sacrifice some quality by quantizing models to make them smaller.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

seinecle, 4 months ago

@TedUnderwood very useful, thx. Also, looking at hosted servers with GPU, starting prices are around 150$ / month (Hetzner). More expensive than my usual projects :/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

TedUnderwood, 4 months ago

@seinecle the advice I’ve been getting over on bsky is that you can do a lot locally with Apple silicon

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment