A Transformer-based Parser for Syriac Morphology

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

A Transformer-based Parser for Syriac Morphology. / Naaijer, Martijn; Sikkel, Constantijn; Coeckelbergs, Mathias; Attema, Jisk; Van Peursen, Willem.

Proceedings of the Ancient Language Processing Workshop associated with the 14th International Conference on Recent Advances in Natural Language Processing RANLP 2023. Varna, Bulgaria, 2023. p. 23-29.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Naaijer, M, Sikkel, C, Coeckelbergs, M, Attema, J & Van Peursen, W 2023, A Transformer-based Parser for Syriac Morphology. in Proceedings of the Ancient Language Processing Workshop associated with the 14th International Conference on Recent Advances in Natural Language Processing RANLP 2023. Varna, Bulgaria, pp. 23-29. <https://aclanthology.org/2023.alp-1.3.pdf>

APA

Naaijer, M., Sikkel, C., Coeckelbergs, M., Attema, J., & Van Peursen, W. (2023). A Transformer-based Parser for Syriac Morphology. In Proceedings of the Ancient Language Processing Workshop associated with the 14th International Conference on Recent Advances in Natural Language Processing RANLP 2023 (pp. 23-29). https://aclanthology.org/2023.alp-1.3.pdf

Vancouver

Naaijer M, Sikkel C, Coeckelbergs M, Attema J, Van Peursen W. A Transformer-based Parser for Syriac Morphology. In Proceedings of the Ancient Language Processing Workshop associated with the 14th International Conference on Recent Advances in Natural Language Processing RANLP 2023. Varna, Bulgaria. 2023. p. 23-29

Author

Naaijer, Martijn ; Sikkel, Constantijn ; Coeckelbergs, Mathias ; Attema, Jisk ; Van Peursen, Willem. / A Transformer-based Parser for Syriac Morphology. Proceedings of the Ancient Language Processing Workshop associated with the 14th International Conference on Recent Advances in Natural Language Processing RANLP 2023. Varna, Bulgaria, 2023. pp. 23-29

Bibtex

@inproceedings{bba2741fb14c45788daed2af90835d96,

title = "A Transformer-based Parser for Syriac Morphology",

abstract = "In this project we train a Transformer-basedmodel from scratch, with the goal of parsing themorphology of Ancient Syriac texts asaccurately as possible. Syriac is a low-resourcelanguage, only a relatively small training setwas available. Therefore, the training set wasexpanded by adding Biblical Hebrew data to it.Five different experiments were done: themodel was trained on Syriac data only, it wastrained with mixed Syriac and (un)vocalizedHebrew data, and it was trained first on(un)vocalized Hebrew data and then trainedfurther on Syriac data. The models trained onHebrew and Syriac data consistentlyoutperform the models trained on Syriac dataonly. This shows that the differences betweenSyriac and Hebrew are small enough that it isworth adding Hebrew data to train the modelfor parsing Syriac morphology. Trainingmodels with data from multiple languages is animportant trend in NLP, we show that thisworks well for relatively small datasets ofSyriac and Hebrew. ",

author = "Martijn Naaijer and Constantijn Sikkel and Mathias Coeckelbergs and Jisk Attema and {Van Peursen}, Willem",

year = "2023",

language = "English",

isbn = "978-954-452-087-8",

pages = "23--29",

booktitle = "Proceedings of the Ancient Language Processing Workshop associated with the 14th International Conference on Recent Advances in Natural Language Processing RANLP 2023",

}

RIS

TY - GEN

T1 - A Transformer-based Parser for Syriac Morphology

AU - Naaijer, Martijn

AU - Sikkel, Constantijn

AU - Coeckelbergs, Mathias

AU - Attema, Jisk

AU - Van Peursen, Willem

PY - 2023

Y1 - 2023

N2 - In this project we train a Transformer-basedmodel from scratch, with the goal of parsing themorphology of Ancient Syriac texts asaccurately as possible. Syriac is a low-resourcelanguage, only a relatively small training setwas available. Therefore, the training set wasexpanded by adding Biblical Hebrew data to it.Five different experiments were done: themodel was trained on Syriac data only, it wastrained with mixed Syriac and (un)vocalizedHebrew data, and it was trained first on(un)vocalized Hebrew data and then trainedfurther on Syriac data. The models trained onHebrew and Syriac data consistentlyoutperform the models trained on Syriac dataonly. This shows that the differences betweenSyriac and Hebrew are small enough that it isworth adding Hebrew data to train the modelfor parsing Syriac morphology. Trainingmodels with data from multiple languages is animportant trend in NLP, we show that thisworks well for relatively small datasets ofSyriac and Hebrew.

AB - In this project we train a Transformer-basedmodel from scratch, with the goal of parsing themorphology of Ancient Syriac texts asaccurately as possible. Syriac is a low-resourcelanguage, only a relatively small training setwas available. Therefore, the training set wasexpanded by adding Biblical Hebrew data to it.Five different experiments were done: themodel was trained on Syriac data only, it wastrained with mixed Syriac and (un)vocalizedHebrew data, and it was trained first on(un)vocalized Hebrew data and then trainedfurther on Syriac data. The models trained onHebrew and Syriac data consistentlyoutperform the models trained on Syriac dataonly. This shows that the differences betweenSyriac and Hebrew are small enough that it isworth adding Hebrew data to train the modelfor parsing Syriac morphology. Trainingmodels with data from multiple languages is animportant trend in NLP, we show that thisworks well for relatively small datasets ofSyriac and Hebrew.

M3 - Article in proceedings

SN - 978-954-452-087-8

SP - 23

EP - 29

BT - Proceedings of the Ancient Language Processing Workshop associated with the 14th International Conference on Recent Advances in Natural Language Processing RANLP 2023

CY - Varna, Bulgaria

ER -

ID: 366755414