[Resource] Llama3 70B Successfully Deployed on a Single 4GB GPU

The open-source language model Llama3 has been released, and it has been confirmed that it can be run locally on a single GPU with only 4GB of VRAM using the AirLLM framework. Llama3’s performance is comparable to GPT-4 and Claude3 Opus, and its success is attributed to its massive increase in training data and technical improvements in training methods. The model’s architecture remains unchanged, but its training data has increased from 2T to 15T, with a focus on quality filtering and deduplication. The development of Llama3 highlights the importance of data quality and the role of open-source culture in AI development, and raises questions about the future of open-source models versus closed-source ones in the field of AI.

Summarized by Llama 3 70B Instruct

Image

Image alternative text

Mechaguana, 26 days ago

I tried running ollama with the mistral model running, you need a good graphics card to run your own llm, i had to wait 20 minutes for one full response.

Granted, the laptop i was running it with was garbage but it really put into perspective how expensive running an llm can really be.

This shit wont be free forever.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Aquila, 26 days ago

Only works on apple silicon. Am I reading that right?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

hyperhypervisor, 26 days ago

No, they just mention that only Apple silicon is supported if you’re using MacOS

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

voracitude, 26 days ago

That’s very cool, any idea about tokens/sec performance and on what hardware? For reference my 3070 gets ~19-25 tokens/sec with llama3 7B.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment