Machine Learning - kbin.social

RecycleGPT: An Autoregressive Language Model with Recyclable Module (arxiv.org)

Existing large language models have to run K times to generate a sequence of K tokens. In this paper, we present RecycleGPT, a generative language model with fast decoding speed by recycling pre-generated model states without running the whole model in multiple steps. Our approach relies on the observation that adjacent tokens...

Universal and Transferable Attacks on Aligned Language Models (llm-attacks.org)

Multitask Pretraining with Structured Knowledge for Text-to-SQL Generation (aclanthology.org)

Many machine learning-based low-code or no-code applications involve generating code that interacts with structured knowledge. For example, one of the most studied tasks in this area is generating SQL code from a natural language statement. Prior work shows that incorporating context information from the database schema, such as...

Retentive Network: A Successor to Transformer for Large Language Models (arxiv.org)

This is an exciting new paper that replaces attention in the Transformer architecture with a set of decomposable matrix operations that retain the modeling capacity of Transformer models, while allowing parallel training and efficient RNN-like inference without the use of attention (it doesn't use a softmax)....

mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs (arxiv.org)

Modular vision-language models (Vision-LLMs) align pretrained image encoders with (pretrained) large language models (LLMs), representing a computationally much more efficient alternative to end-to-end training of large vision-language models from scratch, which is prohibitively expensive for most. Vision-LLMs instead post-hoc...

“Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors (aclanthology.org)

Deep neural networks (DNNs) are often used for text classification due to their high accuracy. However, DNNs can be computationally intensive, requiring millions of parameters and large amounts of labeled data, which can make them expensive to use, to optimize, and to transfer to out-of-distribution (OOD) cases in practice. In...

Introducing Keras Core: Keras for TensorFlow, JAX, and PyTorch. (keras.io)

Keras 3.0 now works with TensorFlow, JAX and PyTorch. Also introduces a bunch new features. Check it out.

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time (proceedings.mlr.press)

The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large...

NeurIPS 2023 Machine Unlearning Challenge (unlearning-challenge.github.io)

Deep neural networks are at the center of rapid progress in AI, with applications to computer vision, natural language processing, speech recognition and others. While this progress offers many exciting opportunities, it also introduces new challenges, as we researchers bear the responsibility to understand and mitigate the...

Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages (arxiv.org)

Vision-Language Pre-training (VLP) has advanced the performance of many vision-language tasks, such as image-text retrieval, visual entailment, and visual reasoning. The pre-training mostly utilizes lexical databases and image queries in English. Previous work has demonstrated that the pre-training in English does not transfer...

PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news (blog.mithrilsecurity.io)

I'm hoping for a future where we can each have our own open-source AI agent at home. Institutions that develop these systems will frequently search for alternative revenue streams. Sneaking misinformation and bias into a model may be one of them. We need ways to guard against that....

GitHub - mazzzystar/Queryable: Run CLIP on iPhone to Search Photos. (github.com)

The open source version of Queryable, an iOS app the CLIP model on iOS to search the Photos album offline....

CoDi: Generate Anything from Anything All At Once through Composable Diffusion (codi-gen.github.io)

Abstract:...

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training (arxiv.org)

Abstract:...

Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models (arxiv.org)

Recently, growing interest has been aroused in extending the multimodal capability of large language models (LLMs), e.g., vision-language (VL) learning, which is regarded as the next milestone of artificial general intelligence. However, existing solutions are prohibitively expensive, which not only need to optimize excessive...

GitHub - PiotrNawrot/nanoT5: Fast & Simple repository for pre-training and fine-tuning T5-style models (github.com)

This repository comprises the code to reproduce the pre-training of a "Large Language Model" (T5) under a limited budget (1xA100 GPU, < 24 hours) in PyTorch. We start from the randomly initialised T5-base-v1.1 (248M parameters) model, and we pre-train it on the English subset of the C4 dataset and then fine-tune it on...

Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing (arxiv.org)

Vision transformers (ViTs) have significantly changed the computer vision landscape and have periodically exhibited superior performance in vision tasks compared to convolutional neural networks (CNNs). Although the jury is still out on which model type is superior, each has unique inductive biases that shape their learning and...

A Survey of Vision-Language Pre-training from the Lens of Multimodal Machine Translation (arxiv.org)

Large language models such as BERT and the GPT series started a paradigm shift that calls for building general-purpose models via pre-training on large datasets, followed by fine-tuning on task-specific datasets. There is now a plethora of large pre-trained models for Natural Language Processing and Computer Vision. Recently, we...

Vision-Language Models for Vision Tasks: A Survey (arxiv.org)

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks (DNNs) training, and they usually train a DNN for each single visual recognition task, leading to a laborious and time-consuming visual recognition paradigm. To address the two challenges, Vision-Language Models (VLMs) have been...

Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models (arxiv.org)

Abstract:...

OC Voice Conversion With Just Nearest Neighbors (arxiv.org)

TL;DR: want to convert your voice to another person's voice? Or even to a whisper? Or a dog barking? Or to any other random speech clip? Give our new method a try: https://bshall.github.io/knn-vc...

Extending Context Window of Large Language Models via Positional Interpolation (arxiv.org)

Interesting technique to increase the context window of language models by finetuning on a small number of samples after pretraining....

BayesFlow: Amortized Bayesian Workflows With Neural Networks (arxiv.org)

Modern Bayesian inference involves a mixture of computational techniques for estimating, validating, and drawing conclusions from probabilistic models as part of principled workflows for data analysis. Typical problems in Bayesian workflows are the approximation of intractable posterior distributions for diverse model types and...

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks (arxiv.org)

Large language models (LLMs) have notably accelerated progress towards artificial general intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing them with immense potential across a range of applications. However, in the field of computer vision, despite the availability of numerous...

Inverse Scaling: When Bigger Isn't Better (arxiv.org)

Abstract:...