Probing Cross-Modal Representations in Multi-Step Relational Reasoning

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Probing Cross-Modal Representations in Multi-Step Relational Reasoning. / Parfenova, Iuliia; Elliott, Desmond; Fernández, Raquel; Pezzelle, Sandro.

Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021). Association for Computational Linguistics, 2021. p. 152-162.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Parfenova, I, Elliott, D, Fernández, R & Pezzelle, S 2021, Probing Cross-Modal Representations in Multi-Step Relational Reasoning. in Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021). Association for Computational Linguistics, pp. 152-162, 6th Workshop on Representation Learning for NLP (RepL4NLP-2021), Online, 01/08/2021. https://doi.org/10.18653/v1/2021.repl4nlp-1.16

APA

Parfenova, I., Elliott, D., Fernández, R., & Pezzelle, S. (2021). Probing Cross-Modal Representations in Multi-Step Relational Reasoning. In Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021) (pp. 152-162). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.repl4nlp-1.16

Vancouver

Parfenova I, Elliott D, Fernández R, Pezzelle S. Probing Cross-Modal Representations in Multi-Step Relational Reasoning. In Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021). Association for Computational Linguistics. 2021. p. 152-162 https://doi.org/10.18653/v1/2021.repl4nlp-1.16

Author

Parfenova, Iuliia ; Elliott, Desmond ; Fernández, Raquel ; Pezzelle, Sandro. / Probing Cross-Modal Representations in Multi-Step Relational Reasoning. Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021). Association for Computational Linguistics, 2021. pp. 152-162

Bibtex

@inproceedings{aad9156a011d499abdab8905bbaa33ea,

title = "Probing Cross-Modal Representations in Multi-Step Relational Reasoning",

abstract = "We investigate the representations learned by vision and language models in tasks that require relational reasoning. Focusing on the problem of assessing the relative size of objects in abstract visual contexts, we analyse both one-step and two-step reasoning. For the latter, we construct a new dataset of three-image scenes and define a task that requires reasoning at the level of the individual images and across images in a scene. We probe the learned model representations using diagnostic classifiers. Our experiments show that pretrained multimodal transformer-based architectures can perform higher-level relational reasoning, and are able to learn representations for novel tasks and data that are very different from what was seen in pretraining.",

author = "Iuliia Parfenova and Desmond Elliott and Raquel Fern{\'a}ndez and Sandro Pezzelle",

year = "2021",

doi = "10.18653/v1/2021.repl4nlp-1.16",

language = "English",

pages = "152--162",

booktitle = "Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)",

publisher = "Association for Computational Linguistics",

note = "6th Workshop on Representation Learning for NLP (RepL4NLP-2021) ; Conference date: 01-08-2021 Through 01-08-2021",

}

RIS

TY - GEN

T1 - Probing Cross-Modal Representations in Multi-Step Relational Reasoning

AU - Parfenova, Iuliia

AU - Elliott, Desmond

AU - Fernández, Raquel

AU - Pezzelle, Sandro

PY - 2021

Y1 - 2021

N2 - We investigate the representations learned by vision and language models in tasks that require relational reasoning. Focusing on the problem of assessing the relative size of objects in abstract visual contexts, we analyse both one-step and two-step reasoning. For the latter, we construct a new dataset of three-image scenes and define a task that requires reasoning at the level of the individual images and across images in a scene. We probe the learned model representations using diagnostic classifiers. Our experiments show that pretrained multimodal transformer-based architectures can perform higher-level relational reasoning, and are able to learn representations for novel tasks and data that are very different from what was seen in pretraining.

AB - We investigate the representations learned by vision and language models in tasks that require relational reasoning. Focusing on the problem of assessing the relative size of objects in abstract visual contexts, we analyse both one-step and two-step reasoning. For the latter, we construct a new dataset of three-image scenes and define a task that requires reasoning at the level of the individual images and across images in a scene. We probe the learned model representations using diagnostic classifiers. Our experiments show that pretrained multimodal transformer-based architectures can perform higher-level relational reasoning, and are able to learn representations for novel tasks and data that are very different from what was seen in pretraining.

U2 - 10.18653/v1/2021.repl4nlp-1.16

DO - 10.18653/v1/2021.repl4nlp-1.16

M3 - Article in proceedings

SP - 152

EP - 162

BT - Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)

PB - Association for Computational Linguistics

T2 - 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)

Y2 - 1 August 2021 through 1 August 2021

ER -

ID: 299038005