Visual recognition with humans in the loop
Research output: Contribution to journal › Conference article › Research › peer-review
Standard
Visual recognition with humans in the loop. / Branson, Steve; Wah, Catherine; Schroff, Florian; Babenko, Boris; Welinder, Peter; Perona, Pietro; Belongie, Serge.
In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), No. PART 4, 2010, p. 438-451.Research output: Contribution to journal › Conference article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Visual recognition with humans in the loop
AU - Branson, Steve
AU - Wah, Catherine
AU - Schroff, Florian
AU - Babenko, Boris
AU - Welinder, Peter
AU - Perona, Pietro
AU - Belongie, Serge
PY - 2010
Y1 - 2010
N2 - We present an interactive, hybrid human-computer method for object classification. The method applies to classes of objects that are recognizable by people with appropriate expertise (e.g., animal species or airplane model), but not (in general) by people without such expertise. It can be seen as a visual version of the 20 questions game, where questions based on simple visual attributes are posed interactively. The goal is to identify the true class while minimizing the number of questions asked, using the visual content of the image. We introduce a general framework for incorporating almost any off-the-shelf multi-class object recognition algorithm into the visual 20 questions game, and provide methodologies to account for imperfect user responses and unreliable computer vision algorithms. We evaluate our methods on Birds-200, a difficult dataset of 200 tightly-related bird species, and on the Animals With Attributes dataset. Our results demonstrate that incorporating user input drives up recognition accuracy to levels that are good enough for practical applications, while at the same time, computer vision reduces the amount of human interaction required.
AB - We present an interactive, hybrid human-computer method for object classification. The method applies to classes of objects that are recognizable by people with appropriate expertise (e.g., animal species or airplane model), but not (in general) by people without such expertise. It can be seen as a visual version of the 20 questions game, where questions based on simple visual attributes are posed interactively. The goal is to identify the true class while minimizing the number of questions asked, using the visual content of the image. We introduce a general framework for incorporating almost any off-the-shelf multi-class object recognition algorithm into the visual 20 questions game, and provide methodologies to account for imperfect user responses and unreliable computer vision algorithms. We evaluate our methods on Birds-200, a difficult dataset of 200 tightly-related bird species, and on the Animals With Attributes dataset. Our results demonstrate that incorporating user input drives up recognition accuracy to levels that are good enough for practical applications, while at the same time, computer vision reduces the amount of human interaction required.
UR - http://www.scopus.com/inward/record.url?scp=78149300909&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-15561-1_32
DO - 10.1007/978-3-642-15561-1_32
M3 - Conference article
AN - SCOPUS:78149300909
SP - 438
EP - 451
JO - Lecture Notes in Computer Science
JF - Lecture Notes in Computer Science
SN - 0302-9743
IS - PART 4
T2 - 11th European Conference on Computer Vision, ECCV 2010
Y2 - 10 September 2010 through 11 September 2010
ER -
ID: 302048098