Smaug-72B-v0.1: The New Open-Source LLM Roaring to the Top of the Leaderboard (huggingface.co)
Abacus.ai:...
Abacus.ai:...
From the abstract: “Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}.”...
Based off of deepseek coder, the current SOTA 33B model, allegedly has gpt 3.5 levels of performance, will be excited to test once I’ve made exllamav2 quants and will try to update with my findings as a copilot model
here is the refiner: huggingface.co/…/stable-diffusion-xl-refiner-1.0
https://lemmy.kya.moe/pictrs/image/5c507a45-d8af-428e-94e6-30ff8eefeb12.png...
Context: Falcon is a popular free LLM, this is their biggest model yet and they claim it’s now the best open model in the market right now.
A short journey to long-context models....
cross-posted from: https://lemmy.world/post/135600...
I’ve been using TheBlokes Q8 of huggingface.co/teknium/OpenHermes-2.5-Mistral-7B, but now this one (huggingface.co/…/OpenHermes-2.5-neural-chat-7b-v3…) I think is killing it. Has anyone else tested it?
Description:...
This release is trained on a curated filtered subset of most of our GPT-4 augmented data....
The open-source language model Llama3 has been released, and it has been confirmed that it can be run locally on a single GPU with only 4GB of VRAM using the AirLLM framework. Llama3’s performance is comparable to GPT-4 and Claude3 Opus, and its success is attributed to its massive increase in training data and technical...
This one is based on llama 2, first one worked very well for rule and structure following with guidance so I’m highly intrigued to see if this lives up to the previous
A very capable chat model built on top of the new Mistral MoE model, trained on the SlimOrca dataset for 1 epoch, using QLoRA.
https://lemmy.kya.moe/imgproxy?src=lemmy.kya.moe%2fpictrs/image/3fde0c15-ed8e-41ef-9e69-4383b1b7c804.png...
We find GPT-4 judgments correlate strongly with humans, with human agreement with GPT-4 typically similar or higher than inter-human annotator agreement....
Model is trained on his own orca style dataset as well as some airoboros apparently to increase creativity...
These are the full weights, the quants are incoming from TheBloke already, will update this post when they’re fully uploaded...
This is Llama 2 13b with some additional attention heads from original-flavor Llama 33b frankensteined on....
cross-posted from: lemmy.world/post/1760388...
From: https://old.reddit.com/r/LocalLLaMA/comments/14hy369/wizardlm33bv10uncensored/...