The Danish Gigaword Project
Research output: Working paper › Research
Standard
The Danish Gigaword Project. / Strømberg-Derczynski, Leon; Baglini, Rebekah; Christiansen, Morten H.; Ciosici, Manuel R.; Dalsgaard, Jacob Aarup; Fusaroli, Riccardo; Henrichsen, Peter Juel; Hvingelby, Rasmus; Kirkedal, Andreas; Kjeldsen, Alex Speed; Ladefoged, Claus; Nielsen, Finn Årup; Petersen, Malte Lau; Rystrøm, Jonathan Hvithamar; Varab, Daniel.
2020.Research output: Working paper › Research
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - UNPB
T1 - The Danish Gigaword Project
AU - Strømberg-Derczynski, Leon
AU - Baglini, Rebekah
AU - Christiansen, Morten H.
AU - Ciosici, Manuel R.
AU - Dalsgaard, Jacob Aarup
AU - Fusaroli, Riccardo
AU - Henrichsen, Peter Juel
AU - Hvingelby, Rasmus
AU - Kirkedal, Andreas
AU - Kjeldsen, Alex Speed
AU - Ladefoged, Claus
AU - Nielsen, Finn Årup
AU - Petersen, Malte Lau
AU - Rystrøm, Jonathan Hvithamar
AU - Varab, Daniel
PY - 2020
Y1 - 2020
N2 - Danish is a North Germanic/Scandinavian language spoken primarily in Denmark, a country with a tradition of technological and scientific innovation. However, from a technological perspective, the Danish language has received relatively little attention and, as a result, Danish language technology is hard to develop, in part due to a lack of large or broad-coverage Danish corpora. This paper describes the Danish Gigaword project, which aims to construct a freely-available one billion word corpus of Danish text that represents the breadth of the written language.
AB - Danish is a North Germanic/Scandinavian language spoken primarily in Denmark, a country with a tradition of technological and scientific innovation. However, from a technological perspective, the Danish language has received relatively little attention and, as a result, Danish language technology is hard to develop, in part due to a lack of large or broad-coverage Danish corpora. This paper describes the Danish Gigaword project, which aims to construct a freely-available one billion word corpus of Danish text that represents the breadth of the written language.
UR - https://arxiv.org/abs/2005.03521
M3 - Working paper
BT - The Danish Gigaword Project
ER -
ID: 240835080