Please don't post links to reddit discussions.

What's In My Big Data? (arxiv.org)

Large text corpora are the backbone of language models. However, we have a limited understanding of the content of these corpora, including general statistics, quality, social factors, and inclusion of evaluation data (contamination). In this work, we propose What's In My Big Data? (WIMBD), a platform and a set of sixteen...

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI (arxiv.org)

The race to train language models on vast, diverse, and inconsistently documented datasets has raised pressing concerns about the legal and ethical risks for practitioners. To remedy these practices threatening data transparency and understanding, we convene a multi-disciplinary effort between legal and machine learning experts...

GPT-4 Doesn't Know It's Wrong: An Analysis of Iterative Prompting for Reasoning Problems (arxiv.org)

There has been considerable divergence of opinion on the reasoning abilities of Large Language Models (LLMs). While the initial optimism that reasoning might emerge automatically with scale has been tempered thanks to a slew of counterexamples, a wide spread belief in their iterative self-critique capabilities persists. In this...

A Long Way to Go: Investigating Length Correlations in RLHF (arxiv.org)

Great successes have been reported using Reinforcement Learning from Human Feedback (RLHF) to align large language models. Open-source preference datasets and reward models have enabled wider experimentation beyond generic chat settings, particularly to make systems more "helpful" for tasks like web question answering,...

Think before you speak: Training Language Models With Pause Tokens (arxiv.org)

Language models generate responses by producing a series of tokens in immediate succession: the $(K+1)^{th}$ token is an outcome of manipulating $K$ hidden vectors per layer, one vector per preceding token. What if instead we were to let the model manipulate say, $K+10$ hidden vectors, before it outputs the $(K+1)^{th}$ token?...

Language Modeling Is Compression (arxiv.org)

It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive...

Retentive Network: A Successor to Transformer for Large Language Models (arxiv.org)

This is an exciting new paper that replaces attention in the Transformer architecture with a set of decomposable matrix operations that retain the modeling capacity of Transformer models, while allowing parallel training and efficient RNN-like inference without the use of attention (it doesn't use a softmax)....

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time (proceedings.mlr.press)

The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large...


Averaging model weights seems to help across textual domains as well, see Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models and Scaling Expert Language Models with Unsupervised Domain Discovery. I wonder if the two types of averaging (across hyperparameters and across domains) can be combined to produce even better models.


Research into efficient optimization techniques seems pretty important given the scale of LLMs these days. Nice to see a second-order approach that achieves reasonable wall-clock improvements.

Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing (arxiv.org)

Vision transformers (ViTs) have significantly changed the computer vision landscape and have periodically exhibited superior performance in vision tasks compared to convolutional neural networks (CNNs). Although the jury is still out on which model type is superior, each has unique inductive biases that shape their learning and...


Please don't post links to reddit.


If there isn't any discussion on reddit (no discussion in this case), I don't see a reason to link to reddit; you can just link to the project page. That said, if you think there is important discussion happening that is helpful for understanding the paper, then use a teddit link instead, like:



That's appreciated!


It seems like for creative text generation tasks, metrics have been shown to be deficient; this even holds for the new model-based metrics. That leaves human evaluation (both intrinsic and extrinsic) as the gold standard for those types of tasks. I wonder if the results from this paper (and other future papers that look automatic CV metrics) will lead reviewers to demand more human evaluation in CV tasks like they do for certain NLP tasks.

Koffindodjer, to machinelearning
@Koffindodjer@mastodon.social avatar

@machinelearning am I in the right place? Lol


@Koffindodjer indeed you are!


do you have a link?


hmmm... not sure which model you're referring to. do you have a paper link?

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks (arxiv.org)

Large language models (LLMs) have notably accelerated progress towards artificial general intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing them with immense potential across a range of applications. However, in the field of computer vision, despite the availability of numerous...

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • Durango
  • magazineikmin
  • cubers
  • thenastyranch
  • Youngstown
  • slotface
  • osvaldo12
  • khanakhh
  • mdbf
  • rosin
  • kavyap
  • InstantRegret
  • DreamBathrooms
  • lostlight
  • Backrooms
  • normalnudes
  • modclub
  • GTA5RPClips
  • ethstaker
  • everett
  • tacticalgear
  • cisconetworking
  • provamag3
  • anitta
  • Leos
  • tester
  • provamag4
  • All magazines