seinecle,
@seinecle@ioc.exchange avatar

LLM specialists: besides a powerful computer with a GPU, what kind of tricks can I use to make a locally hosted model spit out a response faster? Thx!

seinecle,
@seinecle@ioc.exchange avatar
TedUnderwood,
@TedUnderwood@sigmoid.social avatar

@seinecle I don’t actually have first hand experience on this yet though I want to soon.

My understanding is that memory (VRAM or Apples unified memory) matters a lot. Also people sacrifice some quality by quantizing models to make them smaller.

seinecle,
@seinecle@ioc.exchange avatar

@TedUnderwood very useful, thx. Also, looking at hosted servers with GPU, starting prices are around 150$ / month (Hetzner). More expensive than my usual projects :/

TedUnderwood,
@TedUnderwood@sigmoid.social avatar

@seinecle the advice I’ve been getting over on bsky is that you can do a lot locally with Apple silicon

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • rosin
  • cisconetworking
  • GTA5RPClips
  • osvaldo12
  • khanakhh
  • DreamBathrooms
  • magazineikmin
  • Youngstown
  • everett
  • mdbf
  • slotface
  • InstantRegret
  • kavyap
  • cubers
  • JUstTest
  • modclub
  • normalnudes
  • Durango
  • thenastyranch
  • ethstaker
  • tacticalgear
  • ngwrru68w68
  • Leos
  • anitta
  • provamag3
  • tester
  • megavids
  • lostlight
  • All magazines