@Fisch@discuss.tchncs.de avatar

Fisch

@Fisch@discuss.tchncs.de

This profile is from a federated server and may be incomplete. Browse more on the original instance.

Fisch,
@Fisch@discuss.tchncs.de avatar

If it’s not available as an application, you should probably look into docker compose

Fisch,
@Fisch@discuss.tchncs.de avatar

That’s probably a real answer from someone on Quora then

Fisch,
@Fisch@discuss.tchncs.de avatar

What I’m using is Text Generation WebUI with an 11B GGUF model from Huggingface. I offloaded all layers to the GPU, which uses about 9GB of VRAM. With GGUF models, you can choose how many layers to offload to the GPU, so it uses less VRAM. Layers that aren’t offloaded use system RAM and the CPU, which will be slower.

Fisch,
@Fisch@discuss.tchncs.de avatar

Didn’t know that. I’m actually the sole moderator of a community I made on lemmy.ml, so that’s good to know. I do still have my old account on lemmy.ml as a moderator too tho.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • rosin
  • mdbf
  • osvaldo12
  • ethstaker
  • tacticalgear
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • modclub
  • Youngstown
  • everett
  • slotface
  • kavyap
  • megavids
  • GTA5RPClips
  • khanakhh
  • cisconetworking
  • tester
  • ngwrru68w68
  • normalnudes
  • Durango
  • InstantRegret
  • cubers
  • provamag3
  • anitta
  • Leos
  • lostlight
  • All magazines