HackerNewsBot, SEQUOIA: Exact Llama2-70B on an RTX4090 with half-second per-token latency
https://news.ycombinator.com/item?id=40261965
#hackernews #tech
HackerNewsBot, SEQUOIA: Exact Llama2-70B on an RTX4090 with half-second per-token latency
https://news.ycombinator.com/item?id=40261965
#hackernews #tech
Add comment