End-to-end scene text recognition

Research

End-to-end scene text recognition

Research output: Contribution to journal › Conference article › Research › peer-review

Standard

End-to-end scene text recognition. / Wang, Kai; Babenko, Boris; Belongie, Serge.

In: Proceedings of the IEEE International Conference on Computer Vision, 2011, p. 1457-1464.

Research output: Contribution to journal › Conference article › Research › peer-review

Harvard

Wang, K, Babenko, B & Belongie, S 2011, 'End-to-end scene text recognition', Proceedings of the IEEE International Conference on Computer Vision, pp. 1457-1464. https://doi.org/10.1109/ICCV.2011.6126402

APA

Wang, K., Babenko, B., & Belongie, S. (2011). End-to-end scene text recognition. Proceedings of the IEEE International Conference on Computer Vision, 1457-1464. https://doi.org/10.1109/ICCV.2011.6126402

Vancouver

Wang K, Babenko B, Belongie S. End-to-end scene text recognition. Proceedings of the IEEE International Conference on Computer Vision. 2011;1457-1464. https://doi.org/10.1109/ICCV.2011.6126402

Author

Wang, Kai ; Babenko, Boris ; Belongie, Serge. / End-to-end scene text recognition. In: Proceedings of the IEEE International Conference on Computer Vision. 2011 ; pp. 1457-1464.

Bibtex

@inproceedings{a6cd8100995a4b1baef406c787b6b1af,

title = "End-to-end scene text recognition",

abstract = "This paper focuses on the problem of word detection and recognition in natural images. The problem is significantly more challenging than reading text in scanned documents, and has only recently gained attention from the computer vision community. Sub-components of the problem, such as text detection and cropped image word recognition, have been studied in isolation [7, 4, 20]. However, what is unclear is how these recent approaches contribute to solving the end-to-end problem of word recognition. We fill this gap by constructing and evaluating two systems. The first, representing the de facto state-of-the-art, is a two stage pipeline consisting of text detection followed by a leading OCR engine. The second is a system rooted in generic object recognition, an extension of our previous work in [20]. We show that the latter approach achieves superior performance. While scene text recognition has generally been treated with highly domain-specific methods, our results demonstrate the suitability of applying generic computer vision methods. Adopting this approach opens the door for real world scene text recognition to benefit from the rapid advances that have been taking place in object recognition.",

author = "Kai Wang and Boris Babenko and Serge Belongie",

year = "2011",

doi = "10.1109/ICCV.2011.6126402",

language = "English",

pages = "1457--1464",

journal = "Proceedings of the IEEE International Conference on Computer Vision",

note = "2011 IEEE International Conference on Computer Vision, ICCV 2011 ; Conference date: 06-11-2011 Through 13-11-2011",

}

RIS

TY - GEN

T1 - End-to-end scene text recognition

AU - Wang, Kai

AU - Babenko, Boris

AU - Belongie, Serge

PY - 2011

Y1 - 2011

N2 - This paper focuses on the problem of word detection and recognition in natural images. The problem is significantly more challenging than reading text in scanned documents, and has only recently gained attention from the computer vision community. Sub-components of the problem, such as text detection and cropped image word recognition, have been studied in isolation [7, 4, 20]. However, what is unclear is how these recent approaches contribute to solving the end-to-end problem of word recognition. We fill this gap by constructing and evaluating two systems. The first, representing the de facto state-of-the-art, is a two stage pipeline consisting of text detection followed by a leading OCR engine. The second is a system rooted in generic object recognition, an extension of our previous work in [20]. We show that the latter approach achieves superior performance. While scene text recognition has generally been treated with highly domain-specific methods, our results demonstrate the suitability of applying generic computer vision methods. Adopting this approach opens the door for real world scene text recognition to benefit from the rapid advances that have been taking place in object recognition.

AB - This paper focuses on the problem of word detection and recognition in natural images. The problem is significantly more challenging than reading text in scanned documents, and has only recently gained attention from the computer vision community. Sub-components of the problem, such as text detection and cropped image word recognition, have been studied in isolation [7, 4, 20]. However, what is unclear is how these recent approaches contribute to solving the end-to-end problem of word recognition. We fill this gap by constructing and evaluating two systems. The first, representing the de facto state-of-the-art, is a two stage pipeline consisting of text detection followed by a leading OCR engine. The second is a system rooted in generic object recognition, an extension of our previous work in [20]. We show that the latter approach achieves superior performance. While scene text recognition has generally been treated with highly domain-specific methods, our results demonstrate the suitability of applying generic computer vision methods. Adopting this approach opens the door for real world scene text recognition to benefit from the rapid advances that have been taking place in object recognition.

UR - http://www.scopus.com/inward/record.url?scp=84863057818&partnerID=8YFLogxK

U2 - 10.1109/ICCV.2011.6126402

DO - 10.1109/ICCV.2011.6126402

M3 - Conference article

AN - SCOPUS:84863057818

SP - 1457

EP - 1464

JO - Proceedings of the IEEE International Conference on Computer Vision

JF - Proceedings of the IEEE International Conference on Computer Vision

T2 - 2011 IEEE International Conference on Computer Vision, ICCV 2011

Y2 - 6 November 2011 through 13 November 2011

ER -

ID: 301830409