Rather a Nurse than a Physician - Contrastive Explanations under Investigation
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Rather a Nurse than a Physician - Contrastive Explanations under Investigation. / Eberle, Oliver; Chalkidis, Ilias; Cabello, Laura; Brandl, Stephanie.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), 2023. p. 6907-6920.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Rather a Nurse than a Physician - Contrastive Explanations under Investigation
AU - Eberle, Oliver
AU - Chalkidis, Ilias
AU - Cabello, Laura
AU - Brandl, Stephanie
PY - 2023
Y1 - 2023
N2 - Contrastive explanations, where one decision is explained *in contrast to another*, are supposed to be closer to how humans explain a decision than non-contrastive explanations, where the decision is not necessarily referenced to an alternative. This claim has never been empirically validated. We analyze four English text-classification datasets (SST2, DynaSent, BIOS and DBpedia-Animals). We fine-tune and extract explanations from three different models (RoBERTa, GTP-2, and T5), each in three different sizes and apply three post-hoc explainability methods (LRP, GradientxInput, GradNorm). We furthermore collect and release human rationale annotations for a subset of 100 samples from the BIOS dataset for contrastive and non-contrastive settings. A cross-comparison between model-based rationales and human annotations, both in contrastive and non-contrastive settings, yields a high agreement between the two settings for models as well as for humans. Moreover, model-based explanations computed in both settings align equally well with human rationales. Thus, we empirically find that humans do not necessarily explain in a contrastive manner.
AB - Contrastive explanations, where one decision is explained *in contrast to another*, are supposed to be closer to how humans explain a decision than non-contrastive explanations, where the decision is not necessarily referenced to an alternative. This claim has never been empirically validated. We analyze four English text-classification datasets (SST2, DynaSent, BIOS and DBpedia-Animals). We fine-tune and extract explanations from three different models (RoBERTa, GTP-2, and T5), each in three different sizes and apply three post-hoc explainability methods (LRP, GradientxInput, GradNorm). We furthermore collect and release human rationale annotations for a subset of 100 samples from the BIOS dataset for contrastive and non-contrastive settings. A cross-comparison between model-based rationales and human annotations, both in contrastive and non-contrastive settings, yields a high agreement between the two settings for models as well as for humans. Moreover, model-based explanations computed in both settings align equally well with human rationales. Thus, we empirically find that humans do not necessarily explain in a contrastive manner.
U2 - 10.18653/v1/2023.emnlp-main.427
DO - 10.18653/v1/2023.emnlp-main.427
M3 - Article in proceedings
SP - 6907
EP - 6920
BT - Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
PB - Association for Computational Linguistics (ACL)
T2 - 2023 Conference on Empirical Methods in Natural Language Processing
Y2 - 6 December 2023 through 10 December 2023
ER -
ID: 383927536