The Sensitivity of Language Models and Humans to Winograd Schema Perturbations

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

The Sensitivity of Language Models and Humans to Winograd Schema Perturbations. / Abdou, Mostafa; Ravishankar, Vinit; Barrett, Maria; Belinkov, Yonatan; Elliott, Desmond; Søgaard, Anders.

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2020. p. 7590-7604.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Abdou, M, Ravishankar, V, Barrett, M, Belinkov, Y, Elliott, D & Søgaard, A 2020, The Sensitivity of Language Models and Humans to Winograd Schema Perturbations. in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 7590-7604, 58th Annual Meeting of the Association for Computational Linguistics, Online, 05/07/2020. https://doi.org/10.18653/v1/2020.acl-main.679

APA

Abdou, M., Ravishankar, V., Barrett, M., Belinkov, Y., Elliott, D., & Søgaard, A. (2020). The Sensitivity of Language Models and Humans to Winograd Schema Perturbations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 7590-7604). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.679

Vancouver

Abdou M, Ravishankar V, Barrett M, Belinkov Y, Elliott D, Søgaard A. The Sensitivity of Language Models and Humans to Winograd Schema Perturbations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. 2020. p. 7590-7604 https://doi.org/10.18653/v1/2020.acl-main.679

Author

Abdou, Mostafa ; Ravishankar, Vinit ; Barrett, Maria ; Belinkov, Yonatan ; Elliott, Desmond ; Søgaard, Anders. / The Sensitivity of Language Models and Humans to Winograd Schema Perturbations. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2020. pp. 7590-7604

Bibtex

@inproceedings{e83f4751013640248c6a137b735d17a1,

title = "The Sensitivity of Language Models and Humans to Winograd Schema Perturbations",

abstract = "Large-scale pretrained language models are the major driving force behind recent improvements in perfromance on the Winograd Schema Challenge, a widely employed test of commonsense reasoning ability. We show, however, with a new diagnostic dataset, that these models are sensitive to linguistic perturbations of the Winograd examples that minimally affect human understanding. Our results highlight interesting differences between humans and language models: language models are more sensitive to number or gender alternations and synonym replacements than humans, and humans are more stable and consistent in their predictions, maintain a much higher absolute performance, and perform better on non-associative instances than associative ones.",

author = "Mostafa Abdou and Vinit Ravishankar and Maria Barrett and Yonatan Belinkov and Desmond Elliott and Anders S{\o}gaard",

year = "2020",

doi = "10.18653/v1/2020.acl-main.679",

language = "English",

pages = "7590--7604",

booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",

publisher = "Association for Computational Linguistics",

note = "58th Annual Meeting of the Association for Computational Linguistics ; Conference date: 05-07-2020 Through 10-07-2020",

}

RIS

TY - GEN

T1 - The Sensitivity of Language Models and Humans to Winograd Schema Perturbations

AU - Abdou, Mostafa

AU - Ravishankar, Vinit

AU - Barrett, Maria

AU - Belinkov, Yonatan

AU - Elliott, Desmond

AU - Søgaard, Anders

PY - 2020

Y1 - 2020

N2 - Large-scale pretrained language models are the major driving force behind recent improvements in perfromance on the Winograd Schema Challenge, a widely employed test of commonsense reasoning ability. We show, however, with a new diagnostic dataset, that these models are sensitive to linguistic perturbations of the Winograd examples that minimally affect human understanding. Our results highlight interesting differences between humans and language models: language models are more sensitive to number or gender alternations and synonym replacements than humans, and humans are more stable and consistent in their predictions, maintain a much higher absolute performance, and perform better on non-associative instances than associative ones.

AB - Large-scale pretrained language models are the major driving force behind recent improvements in perfromance on the Winograd Schema Challenge, a widely employed test of commonsense reasoning ability. We show, however, with a new diagnostic dataset, that these models are sensitive to linguistic perturbations of the Winograd examples that minimally affect human understanding. Our results highlight interesting differences between humans and language models: language models are more sensitive to number or gender alternations and synonym replacements than humans, and humans are more stable and consistent in their predictions, maintain a much higher absolute performance, and perform better on non-associative instances than associative ones.

U2 - 10.18653/v1/2020.acl-main.679

DO - 10.18653/v1/2020.acl-main.679

M3 - Article in proceedings

SP - 7590

EP - 7604

BT - Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

PB - Association for Computational Linguistics

T2 - 58th Annual Meeting of the Association for Computational Linguistics

Y2 - 5 July 2020 through 10 July 2020

ER -

ID: 258374819