Vision Language Transformers: A Survey

Vision language tasks, such as answering questions about or generating captions that describe an image, are difficult tasks for computers to perform. A relatively recent body of research has adapted the pretrained transformer architecture introduced in \citet{vaswani2017attention} to vision language modeling. Transformer models have greatly improved performance and versatility over previous vision language models. They do so by pretraining models on a large generic datasets and transferring their learning to new tasks with minor changes in architecture and parameter values. This type of transfer learning has become the standard modeling practice in both natural language processing and computer vision. Vision language transformers offer the promise of producing similar advancements in tasks which require both vision and language. In this paper, we provide a broad synthesis of the currently available research on vision language transformer models and offer some analysis of their strengths, limitations and some open questions that remain.

Image

Image alternative text

Federation

Status:

Instances:

/m/machinelearning

Microblog (479)

Thread

KingsmanVince

@KingsmanVince@kbin.social

Added: 8 months ago
Online: -
Ratio: 1 (50%)

Magazine

Machine Learning

@machinelearning@kbin.social

Machine learning (ML) is a field devoted to understanding and building methods that let machines "learn" – that is, methods that leverage data to improve computer performance on some set of tasks.

Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as in medicine, email filtering, speech recognition, agriculture, and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.

Rules

Be nice: no offensive behavior, insults or attacks
Make your post clear and comprehensive
Limit self promotion

Created: 11 months ago
Owner: genesis
Subscribers: 1024
Online: -

Tags

#machine #learning #ml #ai #artificial #intelligence

Moderators

genesis
nsa

Active people

Related posts

Join #PyData #Pittsburgh for a casual gathering of the local, national, and international PyData community on the sidelines of #PyCon US 2024! Meet up with fellow #DataScience, #MachineLearning, and scientific computing enthusiasts when the world's largest Python conference comes to town....

Show more

9 days ago to Pittsburgh

The Hundred-Page Machine Learning Book (PDF + EPUB + extra PDF formats) by Andriy Burkov is on sale on Leanpub! Its suggested price is $40.00; get it for $14.00 with this coupon: https://leanpub.com/sh/zkHhTsbl #DataScience #ComputerScience #MachineLearning #Ai

Show more

9 days ago to datascience

Viscous Serenity, Blender3D and AnimateLCM in ComfyUI #aiart #art #ai #digitalart #generativeart #artificialintelligence #machinelearning #aiartcommunity #abstractart #aiartists #neuralart #vqgan #ganart #contemporaryart #deepdream #artist #artoftheday #newmediaart #nightcafestudio #aiartist #modernart #neuralnetworks...

Show more

9 days ago to aiart

Microsoft is just the first....

Show more

4 days ago to microsoft

Related threads

Visions of Chaos Tutorials

Show more

11 months ago to visionsofchaos

How AI is helping airlines mitigate the climate impact of contrails

Show more

9 months ago to science

Interdimensional Machine Room [EXPERIMENTAL VQGAN]

Show more

11 months ago to visionsofchaos

Inside the messy ethics of making war with machines

Show more

8 months ago to technology

Support Us