Cross-Lingual QA as a Stepping Stone for Monolingual Open QA in Icelandic
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Cross-Lingual QA as a Stepping Stone for Monolingual Open QA in Icelandic. / Snæbjarnarson, Vésteinn; Einarsson, Hafsteinn.
MIA 2022 - Workshop on Multilingual Information Access, Proceedings of the Workshop. ed. / Akari Asai; Eunsol Choi; Jonathan H. Clark; Junjie Hu; Chia-Hsuan Lee; Jungo Kasai; Shayne Longpre; Ikuya IkuyaYamada; Rui Zhang. Association for Computational Linguistics (ACL), 2022. p. 29-36.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Cross-Lingual QA as a Stepping Stone for Monolingual Open QA in Icelandic
AU - Snæbjarnarson, Vésteinn
AU - Einarsson, Hafsteinn
N1 - Publisher Copyright: © 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - It can be challenging to build effective open question answering (open QA) systems for languages other than English, mainly due to a lack of labeled data for training. We present a data efficient method to bootstrap such a system for languages other than English. Our approach requires only limited QA resources in the given language, along with machine-translated data, and at least a bilingual language model. To evaluate our approach, we build such a system for the Icelandic language and evaluate performance over trivia style datasets. The corpora used for training are English in origin but machine translated into Icelandic. We train a bilingual Icelandic/English language model to embed English context and Icelandic questions following methodology introduced with DensePhrases (Lee et al., 2021). The resulting system is an open domain cross-lingual QA system between Icelandic and English. Finally, the system is adapted for Icelandic only open QA, demonstrating how it is possible to efficiently create an open QA system with limited access to curated datasets in the language of interest.
AB - It can be challenging to build effective open question answering (open QA) systems for languages other than English, mainly due to a lack of labeled data for training. We present a data efficient method to bootstrap such a system for languages other than English. Our approach requires only limited QA resources in the given language, along with machine-translated data, and at least a bilingual language model. To evaluate our approach, we build such a system for the Icelandic language and evaluate performance over trivia style datasets. The corpora used for training are English in origin but machine translated into Icelandic. We train a bilingual Icelandic/English language model to embed English context and Icelandic questions following methodology introduced with DensePhrases (Lee et al., 2021). The resulting system is an open domain cross-lingual QA system between Icelandic and English. Finally, the system is adapted for Icelandic only open QA, demonstrating how it is possible to efficiently create an open QA system with limited access to curated datasets in the language of interest.
UR - http://www.scopus.com/inward/record.url?scp=85139142520&partnerID=8YFLogxK
M3 - Article in proceedings
AN - SCOPUS:85139142520
SP - 29
EP - 36
BT - MIA 2022 - Workshop on Multilingual Information Access, Proceedings of the Workshop
A2 - Asai, Akari
A2 - Choi, Eunsol
A2 - Clark, Jonathan H.
A2 - Hu, Junjie
A2 - Lee, Chia-Hsuan
A2 - Kasai, Jungo
A2 - Longpre, Shayne
A2 - IkuyaYamada, Ikuya
A2 - Zhang, Rui
PB - Association for Computational Linguistics (ACL)
T2 - 2022 Workshop on Multilingual Information Access, MIA 2022
Y2 - 15 July 2022
ER -
ID: 371184644