Everybody’s talking about Mistral, an upstart French challenger to OpenAI (arstechnica.com)

On Monday, Mistral AI announced a new AI language model called Mixtral 8x7B, a "mixture of experts" (MoE) model with open weights that reportedly truly matches OpenAI's GPT-3.5 in performance—an achievement that has been claimed by others in the past but is being taken seriously by AI heavyweights such as OpenAI's Andrej...

Mixtral 8x7B can process a 32K token context window and works in French, German, Spanish, Italian, and English. (arstechnica.com)

On Monday, Mistral AI announced a new AI language model called Mixtral 8x7B, a "mixture of experts" (MoE) model with open weights that reportedly truly matches OpenAI's GPT-3.5 in performance—an achievement that has been claimed by others in the past but is being taken seriously by AI heavyweights such as OpenAI's Andrej...

Long Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length (blog.salesforceairesearch.com)

TLDR We trained a series of 7B LLMs named XGen-7B with standard dense attention on up to 8K sequence length for up to 1.5T tokens. We also fine tune the models on public-domain instructional data. The main take-aways are: * On standard NLP benchmarks, XGen achieves comparable or better results

Long Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length (blog.salesforceairesearch.com)

TLDR We trained a series of 7B LLMs named XGen-7B with standard dense attention on up to 8K sequence length for up to 1.5T tokens. We also fine tune the models on public-domain instructional data. The main take-aways are: * On standard NLP benchmarks, XGen achieves comparable or better results

OC Long Context Lengths (And Low Resource Friendly)

Thought I'd ask and see if y'all are familiar with upcoming models or techniques that I'm not. I'm aware of the MPT 7B storywriter model and the RWKV models that support up to 8192 tokens, but that's about it as far as "long" context lengths go. I'm also wanting to run this in a VM with limited resources. The most I will be...

The Curse of Recursion: Training on Generated Data Makes Models Forget (arxiv.org)

Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3(.5) and GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT introduced such language models to the general public. It is now clear that large language models (LLMs) are here to stay, and will bring about drastic...

The Curse of Recursion: Training on Generated Data Makes Models Forget (arxiv.org)

Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3(.5) and GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT introduced such language models to the general public. It is now clear that large language models (LLMs) are here to stay, and will bring about drastic...

Meta's Open-Source Massively Multilingual Speech AI Handles over 1,100 Languages (www.infoq.com)

Meta AI open-sourced the Massively Multilingual Speech (MMS) model, which supports automatic speech recognition (ASR) and text-to-speech synthesis (TTS) in over 1,100 languages and language identification (LID) in over 4,000 languages. MMS can outperform existing models and covers nearly 10x the number of languages.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • anitta
  • Durango
  • magazineikmin
  • InstantRegret
  • hgfsjryuu7
  • vwfavf
  • Youngstown
  • slotface
  • thenastyranch
  • ngwrru68w68
  • rosin
  • kavyap
  • PowerRangers
  • DreamBathrooms
  • cisconetworking
  • khanakhh
  • mdbf
  • tacticalgear
  • ethstaker
  • modclub
  • osvaldo12
  • everett
  • tester
  • cubers
  • GTA5RPClips
  • normalnudes
  • Leos
  • provamag3
  • All magazines