End-to-end scene text recognition
Research output: Contribution to journal › Conference article › Research › peer-review
Standard
End-to-end scene text recognition. / Wang, Kai; Babenko, Boris; Belongie, Serge.
In: Proceedings of the IEEE International Conference on Computer Vision, 2011, p. 1457-1464.Research output: Contribution to journal › Conference article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - End-to-end scene text recognition
AU - Wang, Kai
AU - Babenko, Boris
AU - Belongie, Serge
PY - 2011
Y1 - 2011
N2 - This paper focuses on the problem of word detection and recognition in natural images. The problem is significantly more challenging than reading text in scanned documents, and has only recently gained attention from the computer vision community. Sub-components of the problem, such as text detection and cropped image word recognition, have been studied in isolation [7, 4, 20]. However, what is unclear is how these recent approaches contribute to solving the end-to-end problem of word recognition. We fill this gap by constructing and evaluating two systems. The first, representing the de facto state-of-the-art, is a two stage pipeline consisting of text detection followed by a leading OCR engine. The second is a system rooted in generic object recognition, an extension of our previous work in [20]. We show that the latter approach achieves superior performance. While scene text recognition has generally been treated with highly domain-specific methods, our results demonstrate the suitability of applying generic computer vision methods. Adopting this approach opens the door for real world scene text recognition to benefit from the rapid advances that have been taking place in object recognition.
AB - This paper focuses on the problem of word detection and recognition in natural images. The problem is significantly more challenging than reading text in scanned documents, and has only recently gained attention from the computer vision community. Sub-components of the problem, such as text detection and cropped image word recognition, have been studied in isolation [7, 4, 20]. However, what is unclear is how these recent approaches contribute to solving the end-to-end problem of word recognition. We fill this gap by constructing and evaluating two systems. The first, representing the de facto state-of-the-art, is a two stage pipeline consisting of text detection followed by a leading OCR engine. The second is a system rooted in generic object recognition, an extension of our previous work in [20]. We show that the latter approach achieves superior performance. While scene text recognition has generally been treated with highly domain-specific methods, our results demonstrate the suitability of applying generic computer vision methods. Adopting this approach opens the door for real world scene text recognition to benefit from the rapid advances that have been taking place in object recognition.
UR - http://www.scopus.com/inward/record.url?scp=84863057818&partnerID=8YFLogxK
U2 - 10.1109/ICCV.2011.6126402
DO - 10.1109/ICCV.2011.6126402
M3 - Conference article
AN - SCOPUS:84863057818
SP - 1457
EP - 1464
JO - Proceedings of the IEEE International Conference on Computer Vision
JF - Proceedings of the IEEE International Conference on Computer Vision
T2 - 2011 IEEE International Conference on Computer Vision, ICCV 2011
Y2 - 6 November 2011 through 13 November 2011
ER -
ID: 301830409