Visual Prompt Tuning

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Visual Prompt Tuning. / Jia, Menglin; Tang, Luming; Chen, Bor Chun; Cardie, Claire; Belongie, Serge; Hariharan, Bharath; Lim, Ser Nam.

Computer Vision – ECCV 2022: 17th European Conference, Proceedings. ed. / Shai Avidan; Gabriel Brostow; Moustapha Cissé; Giovanni Maria Farinella; Tal Hassner. Springer, 2022. p. 709-727 (Lecture Notes in Computer Science, Vol. 13693 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Jia, M, Tang, L, Chen, BC, Cardie, C, Belongie, S, Hariharan, B & Lim, SN 2022, Visual Prompt Tuning. in S Avidan, G Brostow, M Cissé, GM Farinella & T Hassner (eds), Computer Vision – ECCV 2022: 17th European Conference, Proceedings. Springer, Lecture Notes in Computer Science, vol. 13693 LNCS, pp. 709-727, 17th European Conference on Computer Vision, ECCV 2022, Tel Aviv, Israel, 23/10/2022. https://doi.org/10.1007/978-3-031-19827-4_41

APA

Jia, M., Tang, L., Chen, B. C., Cardie, C., Belongie, S., Hariharan, B., & Lim, S. N. (2022). Visual Prompt Tuning. In S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, & T. Hassner (Eds.), Computer Vision – ECCV 2022: 17th European Conference, Proceedings (pp. 709-727). Springer. Lecture Notes in Computer Science Vol. 13693 LNCS https://doi.org/10.1007/978-3-031-19827-4_41

Vancouver

Jia M, Tang L, Chen BC, Cardie C, Belongie S, Hariharan B et al. Visual Prompt Tuning. In Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T, editors, Computer Vision – ECCV 2022: 17th European Conference, Proceedings. Springer. 2022. p. 709-727. (Lecture Notes in Computer Science, Vol. 13693 LNCS). https://doi.org/10.1007/978-3-031-19827-4_41

Author

Jia, Menglin ; Tang, Luming ; Chen, Bor Chun ; Cardie, Claire ; Belongie, Serge ; Hariharan, Bharath ; Lim, Ser Nam. / Visual Prompt Tuning. Computer Vision – ECCV 2022: 17th European Conference, Proceedings. editor / Shai Avidan ; Gabriel Brostow ; Moustapha Cissé ; Giovanni Maria Farinella ; Tal Hassner. Springer, 2022. pp. 709-727 (Lecture Notes in Computer Science, Vol. 13693 LNCS).

Bibtex

@inproceedings{a3a486a382284175bc26efa444a4037f,

title = "Visual Prompt Tuning",

abstract = "The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, i.e., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. Taking inspiration from recent advances in efficiently tuning large language models, VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen. Via extensive experiments on a wide variety of downstream recognition tasks, we show that VPT achieves significant performance gains compared to other parameter efficient tuning protocols. Most importantly, VPT even outperforms full fine-tuning in many cases across model capacities and training data scales, while reducing per-task storage cost. Code is available at github.com/kmnp/vpt.",

author = "Menglin Jia and Luming Tang and Chen, {Bor Chun} and Claire Cardie and Serge Belongie and Bharath Hariharan and Lim, {Ser Nam}",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 17th European Conference on Computer Vision, ECCV 2022 ; Conference date: 23-10-2022 Through 27-10-2022",

year = "2022",

doi = "10.1007/978-3-031-19827-4_41",

language = "English",

isbn = "978-3-031-19826-7",

series = "Lecture Notes in Computer Science",

publisher = "Springer",

pages = "709--727",

editor = "Shai Avidan and Gabriel Brostow and Moustapha Ciss{\'e} and Farinella, {Giovanni Maria} and Tal Hassner",

booktitle = "Computer Vision – ECCV 2022",

address = "Switzerland",

}

RIS

TY - GEN

T1 - Visual Prompt Tuning

AU - Jia, Menglin

AU - Tang, Luming

AU - Chen, Bor Chun

AU - Cardie, Claire

AU - Belongie, Serge

AU - Hariharan, Bharath

AU - Lim, Ser Nam

PY - 2022

Y1 - 2022

N2 - The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, i.e., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. Taking inspiration from recent advances in efficiently tuning large language models, VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen. Via extensive experiments on a wide variety of downstream recognition tasks, we show that VPT achieves significant performance gains compared to other parameter efficient tuning protocols. Most importantly, VPT even outperforms full fine-tuning in many cases across model capacities and training data scales, while reducing per-task storage cost. Code is available at github.com/kmnp/vpt.

AB - The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, i.e., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. Taking inspiration from recent advances in efficiently tuning large language models, VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen. Via extensive experiments on a wide variety of downstream recognition tasks, we show that VPT achieves significant performance gains compared to other parameter efficient tuning protocols. Most importantly, VPT even outperforms full fine-tuning in many cases across model capacities and training data scales, while reducing per-task storage cost. Code is available at github.com/kmnp/vpt.

U2 - 10.1007/978-3-031-19827-4_41

DO - 10.1007/978-3-031-19827-4_41

M3 - Article in proceedings

AN - SCOPUS:85142715871

SN - 978-3-031-19826-7

T3 - Lecture Notes in Computer Science

SP - 709

EP - 727

BT - Computer Vision – ECCV 2022

A2 - Avidan, Shai

A2 - Brostow, Gabriel

A2 - Cissé, Moustapha

A2 - Farinella, Giovanni Maria

A2 - Hassner, Tal

PB - Springer

T2 - 17th European Conference on Computer Vision, ECCV 2022

Y2 - 23 October 2022 through 27 October 2022

ER -

ID: 342671827