PaLI-3 Vision Language Models: Smaller, Faster, Stronger

This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that compares favorably to similar models that are 10x larger. As part of arriving at this strong performance, we compare Vision Transformer (ViT) models pretrained using classification objectives to contrastively (SigLIP) pretrained ones. We find that, while slightly underperforming on standard image classification benchmarks, SigLIP-based PaLI shows superior performance across various multimodal benchmarks, especially on localization and visually-situated text understanding. We scale the SigLIP image encoder up to 2 billion parameters, and achieves a new state-of-the-art on multilingual cross-modal retrieval. We hope that PaLI-3, at only 5B parameters, rekindles research on fundamental pieces of complex VLMs, and could fuel a new generation of scaled-up models.

Image

Image alternative text

nirogu, 7 months ago

Impressive results! Only wished they had shared some code or any way to replicate the experiments easily

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KingsmanVince, 7 months ago

indeed it would be great if the authors did so. I personally found some non-official implementations:

https://github.com/kyegomez/PALI

https://github.com/ahmdtaha/distributed_sigmoid_loss

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KingsmanVince, 7 months ago

SigLIP

PaLI

PaLI-X

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Federation

Status:

Instances:

/m/machinelearning

Microblog (480)

Thread

KingsmanVince

@KingsmanVince@kbin.social

Added: 7 months ago
Online: -
Ratio: 1 (100%)

Magazine

Machine Learning

@machinelearning@kbin.social

Machine learning (ML) is a field devoted to understanding and building methods that let machines "learn" – that is, methods that leverage data to improve computer performance on some set of tasks.

Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as in medicine, email filtering, speech recognition, agriculture, and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.

Rules

Be nice: no offensive behavior, insults or attacks
Make your post clear and comprehensive
Limit self promotion

Created: 11 months ago
Owner: genesis
Subscribers: 1024
Online: -

Tags

#machine #learning #ml #ai #artificial #intelligence

Moderators

genesis
nsa

Active people

Related posts

Some thoughts on where we are with the evolution of #InformationTechnology, #AI and #machinelearning how we got here, complete with a silly #mathematical anlogy....

Show more

8 days ago to InformationTechnology

The Hundred-Page Machine Learning Book (PDF + EPUB + extra PDF formats) by Andriy Burkov is on sale on Leanpub! Its suggested price is $40.00; get it for $14.00 with this coupon: https://leanpub.com/sh/L5AtBEiI #DataScience #ComputerScience #MachineLearning #Ai

Show more

9 days ago to datascience

#AI and #MachineLearning models are shifting a number of core assumptions on which the various Web stakeholders have been relying on for years....

Show more

4 days ago to ai

Happy Friday! ☀️...

Show more

2 days ago to python

Related threads

Interdimensional Machine Room [EXPERIMENTAL VQGAN]

Show more

11 months ago to visionsofchaos

Visions of Chaos Tutorials

Show more

11 months ago to visionsofchaos

Inside the messy ethics of making war with machines

Show more

9 months ago to technology

How AI is helping airlines mitigate the climate impact of contrails

Show more

9 months ago to science

Support Us