Visual Prompt Tuning
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Visual Prompt Tuning. / Jia, Menglin; Tang, Luming; Chen, Bor Chun; Cardie, Claire; Belongie, Serge; Hariharan, Bharath; Lim, Ser Nam.
Computer Vision – ECCV 2022: 17th European Conference, Proceedings. ed. / Shai Avidan; Gabriel Brostow; Moustapha Cissé; Giovanni Maria Farinella; Tal Hassner. Springer, 2022. p. 709-727 (Lecture Notes in Computer Science, Vol. 13693 LNCS).Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Visual Prompt Tuning
AU - Jia, Menglin
AU - Tang, Luming
AU - Chen, Bor Chun
AU - Cardie, Claire
AU - Belongie, Serge
AU - Hariharan, Bharath
AU - Lim, Ser Nam
N1 - Publisher Copyright: © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, i.e., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. Taking inspiration from recent advances in efficiently tuning large language models, VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen. Via extensive experiments on a wide variety of downstream recognition tasks, we show that VPT achieves significant performance gains compared to other parameter efficient tuning protocols. Most importantly, VPT even outperforms full fine-tuning in many cases across model capacities and training data scales, while reducing per-task storage cost. Code is available at github.com/kmnp/vpt.
AB - The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, i.e., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. Taking inspiration from recent advances in efficiently tuning large language models, VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen. Via extensive experiments on a wide variety of downstream recognition tasks, we show that VPT achieves significant performance gains compared to other parameter efficient tuning protocols. Most importantly, VPT even outperforms full fine-tuning in many cases across model capacities and training data scales, while reducing per-task storage cost. Code is available at github.com/kmnp/vpt.
U2 - 10.1007/978-3-031-19827-4_41
DO - 10.1007/978-3-031-19827-4_41
M3 - Article in proceedings
AN - SCOPUS:85142715871
SN - 978-3-031-19826-7
T3 - Lecture Notes in Computer Science
SP - 709
EP - 727
BT - Computer Vision – ECCV 2022
A2 - Avidan, Shai
A2 - Brostow, Gabriel
A2 - Cissé, Moustapha
A2 - Farinella, Giovanni Maria
A2 - Hassner, Tal
PB - Springer
T2 - 17th European Conference on Computer Vision, ECCV 2022
Y2 - 23 October 2022 through 27 October 2022
ER -
ID: 342671827