Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change?

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change? / Boldsen, Sidsel; Aguirrezabal Zabaleta, Manex; Paggio, Patrizia.

Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change. Association for Computational Linguistics, 2019. p. 86-91.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Boldsen, S, Aguirrezabal Zabaleta, M & Paggio, P 2019, Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change? in Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change. Association for Computational Linguistics, pp. 86-91, Computational Approaches to Historical Language Change 2019, Florence, Italy, 02/08/2019. https://doi.org/10.18653/v1/W19-4711

APA

Boldsen, S., Aguirrezabal Zabaleta, M., & Paggio, P. (2019). Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change? In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change (pp. 86-91). Association for Computational Linguistics. https://doi.org/10.18653/v1/W19-4711

Vancouver

Boldsen S, Aguirrezabal Zabaleta M, Paggio P. Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change? In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change. Association for Computational Linguistics. 2019. p. 86-91 https://doi.org/10.18653/v1/W19-4711

Author

Boldsen, Sidsel ; Aguirrezabal Zabaleta, Manex ; Paggio, Patrizia. / Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change?. Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change. Association for Computational Linguistics, 2019. pp. 86-91

Bibtex

@inproceedings{da07870575eb49f8ad80096107398228,
title = "Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change?",
abstract = "In this work we propose a data-driven methodology for identifying temporal trends in a corpus of medieval charters. We have used perplexities derived from RNNs as a distance measure between documents and then, performed clustering on those distances. We argue that perplexities calculated by such language models are representative of temporal trends. The clusters produced using the K-Means algorithm give an insight of the differences in language in different time periods at least partly due to language change. We suggest that the temporal distribution of the individual clusters might provide a more nuanced picture of temporal trends compared to discrete bins, thus providing better results when used in a classification task.",
author = "Sidsel Boldsen and {Aguirrezabal Zabaleta}, Manex and Patrizia Paggio",
year = "2019",
doi = "10.18653/v1/W19-4711",
language = "English",
pages = "86--91",
booktitle = "Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change",
publisher = "Association for Computational Linguistics",
note = "Computational Approaches to Historical Language Change 2019 : Workshop co-located with ACL 2019, LChange'19 ; Conference date: 02-08-2019 Through 02-08-2019",
url = "https://languagechange.org/events/2019-acl-lcworkshop/",

}

RIS

TY - GEN

T1 - Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change?

AU - Boldsen, Sidsel

AU - Aguirrezabal Zabaleta, Manex

AU - Paggio, Patrizia

PY - 2019

Y1 - 2019

N2 - In this work we propose a data-driven methodology for identifying temporal trends in a corpus of medieval charters. We have used perplexities derived from RNNs as a distance measure between documents and then, performed clustering on those distances. We argue that perplexities calculated by such language models are representative of temporal trends. The clusters produced using the K-Means algorithm give an insight of the differences in language in different time periods at least partly due to language change. We suggest that the temporal distribution of the individual clusters might provide a more nuanced picture of temporal trends compared to discrete bins, thus providing better results when used in a classification task.

AB - In this work we propose a data-driven methodology for identifying temporal trends in a corpus of medieval charters. We have used perplexities derived from RNNs as a distance measure between documents and then, performed clustering on those distances. We argue that perplexities calculated by such language models are representative of temporal trends. The clusters produced using the K-Means algorithm give an insight of the differences in language in different time periods at least partly due to language change. We suggest that the temporal distribution of the individual clusters might provide a more nuanced picture of temporal trends compared to discrete bins, thus providing better results when used in a classification task.

U2 - 10.18653/v1/W19-4711

DO - 10.18653/v1/W19-4711

M3 - Article in proceedings

SP - 86

EP - 91

BT - Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

PB - Association for Computational Linguistics

T2 - Computational Approaches to Historical Language Change 2019

Y2 - 2 August 2019 through 2 August 2019

ER -

ID: 227472498