Multilingual Multimodal Learning with Machine Translated Text

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Multilingual Multimodal Learning with Machine Translated Text. / Qiu, Chen; Oneată, Dan ; Bugliarello, Emanuele; Frank, Stella Christina; Elliott, Desmond.

Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics (ACL), 2022. p. 4178–4193.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Qiu, C, Oneată, D, Bugliarello, E, Frank, SC & Elliott, D 2022, Multilingual Multimodal Learning with Machine Translated Text. in Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics (ACL), pp. 4178–4193, The 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, 07/12/2022. <https://aclanthology.org/2022.findings-emnlp.308/>

APA

Qiu, C., Oneată, D., Bugliarello, E., Frank, S. C., & Elliott, D. (2022). Multilingual Multimodal Learning with Machine Translated Text. In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 4178–4193). Association for Computational Linguistics (ACL). https://aclanthology.org/2022.findings-emnlp.308/

Vancouver

Qiu C, Oneată D, Bugliarello E, Frank SC, Elliott D. Multilingual Multimodal Learning with Machine Translated Text. In Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics (ACL). 2022. p. 4178–4193

Author

Qiu, Chen ; Oneată, Dan ; Bugliarello, Emanuele ; Frank, Stella Christina ; Elliott, Desmond. / Multilingual Multimodal Learning with Machine Translated Text. Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics (ACL), 2022. pp. 4178–4193

Bibtex

@inproceedings{2477f9a45be44cc3b671d68d7d284c13,

title = "Multilingual Multimodal Learning with Machine Translated Text",

abstract = "Most vision-and-language pretraining research focuses on English tasks. However, the creation of multilingual multimodal evaluation datasets (e.g. Multi30K, xGQA, XVNLI, and MaRVL) poses a new challenge in finding high-quality training data that is both multilingual and multimodal. In this paper, we investigate whether machine translating English multimodal data can be an effective proxy for the lack of readily available multilingual data. We call this framework TD-MML: Translated Data for Multilingual Multimodal Learning, and it can be applied to any multimodal dataset and model. We apply it to both pretraining and fine-tuning data with a state-of-the-art model. In order to prevent models from learning from low-quality translated text, we propose two metrics for automatically removing such translations from the resulting datasets. In experiments on five tasks across 20 languages in the IGLUE benchmark, we show that translated data can provide a useful signal for multilingual multimodal learning, both at pretraining and fine-tuning.",

author = "Chen Qiu and Dan Oneat{\u a} and Emanuele Bugliarello and Frank, {Stella Christina} and Desmond Elliott",

year = "2022",

language = "English",

pages = "4178–4193",

booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",

publisher = "Association for Computational Linguistics (ACL)",

address = "United States",

note = "The 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 ; Conference date: 07-12-2022 Through 11-12-2022",

url = "https://2022.emnlp.org/",

}

RIS

TY - GEN

T1 - Multilingual Multimodal Learning with Machine Translated Text

AU - Qiu, Chen

AU - Oneată, Dan

AU - Bugliarello, Emanuele

AU - Frank, Stella Christina

AU - Elliott, Desmond

N1 - Conference code: 17

PY - 2022

Y1 - 2022

N2 - Most vision-and-language pretraining research focuses on English tasks. However, the creation of multilingual multimodal evaluation datasets (e.g. Multi30K, xGQA, XVNLI, and MaRVL) poses a new challenge in finding high-quality training data that is both multilingual and multimodal. In this paper, we investigate whether machine translating English multimodal data can be an effective proxy for the lack of readily available multilingual data. We call this framework TD-MML: Translated Data for Multilingual Multimodal Learning, and it can be applied to any multimodal dataset and model. We apply it to both pretraining and fine-tuning data with a state-of-the-art model. In order to prevent models from learning from low-quality translated text, we propose two metrics for automatically removing such translations from the resulting datasets. In experiments on five tasks across 20 languages in the IGLUE benchmark, we show that translated data can provide a useful signal for multilingual multimodal learning, both at pretraining and fine-tuning.

AB - Most vision-and-language pretraining research focuses on English tasks. However, the creation of multilingual multimodal evaluation datasets (e.g. Multi30K, xGQA, XVNLI, and MaRVL) poses a new challenge in finding high-quality training data that is both multilingual and multimodal. In this paper, we investigate whether machine translating English multimodal data can be an effective proxy for the lack of readily available multilingual data. We call this framework TD-MML: Translated Data for Multilingual Multimodal Learning, and it can be applied to any multimodal dataset and model. We apply it to both pretraining and fine-tuning data with a state-of-the-art model. In order to prevent models from learning from low-quality translated text, we propose two metrics for automatically removing such translations from the resulting datasets. In experiments on five tasks across 20 languages in the IGLUE benchmark, we show that translated data can provide a useful signal for multilingual multimodal learning, both at pretraining and fine-tuning.

M3 - Article in proceedings

SP - 4178

EP - 4193

BT - Findings of the Association for Computational Linguistics: EMNLP 2022

PB - Association for Computational Linguistics (ACL)

T2 - The 2022 Conference on Empirical Methods in Natural Language Processing

Y2 - 7 December 2022 through 11 December 2022

ER -

ID: 339327319