Realistic Zero-Shot Cross-Lingual Transfer in Legal Topic Classification
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
We consider zero-shot cross-lingual transfer in legal topic classification using the recent Multi-EURLEX dataset. Since the original dataset contains parallel documents, which is unrealistic for zero-shot cross-lingual transfer, we develop a new version of the dataset without parallel documents. We use it to show that translation-based methods vastly outperform cross-lingual fine-tuning of multilingually pre-trained models, the best previous zero-shot transfer method for Multi-EURLEX. We also develop a bilingual teacher-student zero-shot transfer approach, which exploits additional unlabeled documents of the target language and performs better than a model fine-tuned directly on labeled target language documents.
Original language | English |
---|---|
Title of host publication | Proceedings of the 12th Hellenic Conference on Artificial Intelligence, SETN 2022 |
Number of pages | 8 |
Publisher | Association for Computing Machinery, Inc. |
Publication date | 2022 |
Article number | 19 |
ISBN (Electronic) | 9781450395977 |
DOIs | |
Publication status | Published - 2022 |
Event | 12th Hellenic Conference on Artificial Intelligence, SETN 2022 - Corfu, Greece Duration: 7 Sep 2022 → 9 Sep 2022 |
Conference
Conference | 12th Hellenic Conference on Artificial Intelligence, SETN 2022 |
---|---|
Land | Greece |
By | Corfu |
Periode | 07/09/2022 → 09/09/2022 |
Sponsor | Hellenic Artificial Intelligence Society, Humanistic and Social Informatics Laboratory (HILab), Ionian University, Department of Informatics |
Series | ACM International Conference Proceeding Series |
---|
Bibliographical note
Publisher Copyright:
© 2022 ACM.
- legal text classification, natural language processing, zero-shot cross-lingual transfer learning
Research areas
ID: 342927381