openaccess.thecvf.com

Finetune Like You Pretrain: Improved Finetuning of Zero-Shot Vision Models (openaccess.thecvf.com)

Finetuning image-text models such as CLIP achieves state-of-the-art accuracies on a variety of benchmarks. However, recent works (Kumar et al., 2022; Wortsman et al., 2021) have shown that even subtle differences in the finetuning process can lead to surprisingly large differences in the final performance, both for...