Clustering Monolingual Vocabularies to Improve Cross-Lingual Generalization
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Documents
- Fulltext
Final published version, 828 KB, PDF document
Multilingual language models exhibit better performance for some languages than for others (Singh et al., 2019), and many languages do not seem to benefit from multilingual sharing at all, presumably as a result of poor multilingual segmentation (Pyysal o et al., 2020). This work explores the idea of learning multilingual language models based on clustering of monolingual segments. We show significant improvements over standard multilingual segmentation and training across nine languages on a question answering task, both in a small model regime and for a model of the size of BERT-base.
Original language | English |
---|---|
Title of host publication | Proceedings of the 1st Workshop on Multilingual Representation Learning |
Publisher | Association for Computational Linguistics |
Publication date | 2021 |
Pages | 32–40 |
DOIs | |
Publication status | Published - 2021 |
Event | 1st Workshop on Multilingual Representation Learning - Online Duration: 11 Nov 2021 → 11 Nov 2021 |
Conference
Conference | 1st Workshop on Multilingual Representation Learning |
---|---|
By | Online |
Periode | 11/11/2021 → 11/11/2021 |
Number of downloads are based on statistics from Google Scholar and www.ku.dk
No data available
ID: 300080332