A statistical model for grammar mapping

Research

A statistical model for grammar mapping

Research output: Contribution to journal › Journal article › Research › peer-review

Standard

A statistical model for grammar mapping. / Basirat, A.; Faili, H.; Nivre, J.

In: Natural Language Engineering, Vol. 22, No. 2, 01.03.2016, p. 215-255.

Research output: Contribution to journal › Journal article › Research › peer-review

Harvard

Basirat, A, Faili, H & Nivre, J 2016, 'A statistical model for grammar mapping', Natural Language Engineering, vol. 22, no. 2, pp. 215-255. https://doi.org/10.1017/S1351324915000017

APA

Basirat, A., Faili, H., & Nivre, J. (2016). A statistical model for grammar mapping. Natural Language Engineering, 22(2), 215-255. https://doi.org/10.1017/S1351324915000017

Vancouver

Basirat A, Faili H, Nivre J. A statistical model for grammar mapping. Natural Language Engineering. 2016 Mar 1;22(2):215-255. https://doi.org/10.1017/S1351324915000017

Author

Basirat, A. ; Faili, H. ; Nivre, J. / A statistical model for grammar mapping. In: Natural Language Engineering. 2016 ; Vol. 22, No. 2. pp. 215-255.

Bibtex

@article{ee40116c1596469cb8a2aad099e5cfcd,

title = "A statistical model for grammar mapping",

abstract = "The two main classes of grammars are (a) hand-crafted grammars, which are developed by language experts, and (b) data-driven grammars, which are extracted from annotated corpora. This paper introduces a statistical method for mapping the elementary structures of a data-driven grammar onto the elementary structures of a hand-crafted grammar in order to combine their advantages. The idea is employed in the context of Lexicalized Tree-Adjoining Grammars (LTAG) and tested on two LTAGs of English: the hand-crafted LTAG developed in the XTAG project, and the data-driven LTAG, which is automatically extracted from the Penn Treebank and used by the MICA parser. We propose a statistical model for mapping any elementary tree sequence of the MICA grammar onto a proper elementary tree sequence of the XTAG grammar. The model has been tested on three subsets of the WSJ corpus that have average lengths of 10, 16, and 18 words, respectively. The experimental results show that full-parse trees with average F 1-scores of 72.49, 64.80, and 62.30 points could be built from 94.97%, 96.01%, and 90.25% of the XTAG elementary tree sequences assigned to the subsets, respectively. Moreover, by reducing the amount of syntactic lexical ambiguity of sentences, the proposed model significantly improves the efficiency of parsing in the XTAG system.",

author = "A. Basirat and H. Faili and J. Nivre",

note = "Publisher Copyright: Copyright {\textcopyright} Cambridge University Press 2015.",

year = "2016",

month = mar,

day = "1",

doi = "10.1017/S1351324915000017",

language = "English",

volume = "22",

pages = "215--255",

journal = "Natural Language Engineering",

issn = "1351-3249",

publisher = "Cambridge University Press",

number = "2",

}

RIS

TY - JOUR

T1 - A statistical model for grammar mapping

AU - Basirat, A.

AU - Faili, H.

AU - Nivre, J.

PY - 2016/3/1

Y1 - 2016/3/1

N2 - The two main classes of grammars are (a) hand-crafted grammars, which are developed by language experts, and (b) data-driven grammars, which are extracted from annotated corpora. This paper introduces a statistical method for mapping the elementary structures of a data-driven grammar onto the elementary structures of a hand-crafted grammar in order to combine their advantages. The idea is employed in the context of Lexicalized Tree-Adjoining Grammars (LTAG) and tested on two LTAGs of English: the hand-crafted LTAG developed in the XTAG project, and the data-driven LTAG, which is automatically extracted from the Penn Treebank and used by the MICA parser. We propose a statistical model for mapping any elementary tree sequence of the MICA grammar onto a proper elementary tree sequence of the XTAG grammar. The model has been tested on three subsets of the WSJ corpus that have average lengths of 10, 16, and 18 words, respectively. The experimental results show that full-parse trees with average F 1-scores of 72.49, 64.80, and 62.30 points could be built from 94.97%, 96.01%, and 90.25% of the XTAG elementary tree sequences assigned to the subsets, respectively. Moreover, by reducing the amount of syntactic lexical ambiguity of sentences, the proposed model significantly improves the efficiency of parsing in the XTAG system.

AB - The two main classes of grammars are (a) hand-crafted grammars, which are developed by language experts, and (b) data-driven grammars, which are extracted from annotated corpora. This paper introduces a statistical method for mapping the elementary structures of a data-driven grammar onto the elementary structures of a hand-crafted grammar in order to combine their advantages. The idea is employed in the context of Lexicalized Tree-Adjoining Grammars (LTAG) and tested on two LTAGs of English: the hand-crafted LTAG developed in the XTAG project, and the data-driven LTAG, which is automatically extracted from the Penn Treebank and used by the MICA parser. We propose a statistical model for mapping any elementary tree sequence of the MICA grammar onto a proper elementary tree sequence of the XTAG grammar. The model has been tested on three subsets of the WSJ corpus that have average lengths of 10, 16, and 18 words, respectively. The experimental results show that full-parse trees with average F 1-scores of 72.49, 64.80, and 62.30 points could be built from 94.97%, 96.01%, and 90.25% of the XTAG elementary tree sequences assigned to the subsets, respectively. Moreover, by reducing the amount of syntactic lexical ambiguity of sentences, the proposed model significantly improves the efficiency of parsing in the XTAG system.

UR - http://www.scopus.com/inward/record.url?scp=84958037791&partnerID=8YFLogxK

U2 - 10.1017/S1351324915000017

DO - 10.1017/S1351324915000017

M3 - Journal article

AN - SCOPUS:84958037791

VL - 22

SP - 215

EP - 255

JO - Natural Language Engineering

JF - Natural Language Engineering

SN - 1351-3249

IS - 2

ER -

ID: 366047463