Realistic Zero-Shot Cross-Lingual Transfer in Legal Topic Classification
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Realistic Zero-Shot Cross-Lingual Transfer in Legal Topic Classification. / Xenouleas, Stratos; Tsoukara, Alexia; Panagiotakis, Giannis; Chalkidis, Ilias; Androutsopoulos, Ion.
Proceedings of the 12th Hellenic Conference on Artificial Intelligence, SETN 2022. Association for Computing Machinery, Inc., 2022. 19 (ACM International Conference Proceeding Series).Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Realistic Zero-Shot Cross-Lingual Transfer in Legal Topic Classification
AU - Xenouleas, Stratos
AU - Tsoukara, Alexia
AU - Panagiotakis, Giannis
AU - Chalkidis, Ilias
AU - Androutsopoulos, Ion
N1 - Publisher Copyright: © 2022 ACM.
PY - 2022
Y1 - 2022
N2 - We consider zero-shot cross-lingual transfer in legal topic classification using the recent Multi-EURLEX dataset. Since the original dataset contains parallel documents, which is unrealistic for zero-shot cross-lingual transfer, we develop a new version of the dataset without parallel documents. We use it to show that translation-based methods vastly outperform cross-lingual fine-tuning of multilingually pre-trained models, the best previous zero-shot transfer method for Multi-EURLEX. We also develop a bilingual teacher-student zero-shot transfer approach, which exploits additional unlabeled documents of the target language and performs better than a model fine-tuned directly on labeled target language documents.
AB - We consider zero-shot cross-lingual transfer in legal topic classification using the recent Multi-EURLEX dataset. Since the original dataset contains parallel documents, which is unrealistic for zero-shot cross-lingual transfer, we develop a new version of the dataset without parallel documents. We use it to show that translation-based methods vastly outperform cross-lingual fine-tuning of multilingually pre-trained models, the best previous zero-shot transfer method for Multi-EURLEX. We also develop a bilingual teacher-student zero-shot transfer approach, which exploits additional unlabeled documents of the target language and performs better than a model fine-tuned directly on labeled target language documents.
KW - legal text classification
KW - natural language processing
KW - zero-shot cross-lingual transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85138412890&partnerID=8YFLogxK
U2 - 10.1145/3549737.3549760
DO - 10.1145/3549737.3549760
M3 - Article in proceedings
AN - SCOPUS:85138412890
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the 12th Hellenic Conference on Artificial Intelligence, SETN 2022
PB - Association for Computing Machinery, Inc.
T2 - 12th Hellenic Conference on Artificial Intelligence, SETN 2022
Y2 - 7 September 2022 through 9 September 2022
ER -
ID: 342927381