当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Does External Knowledge Help Explainable Natural Language Inference? Automatic Evaluation vs. Human Ratings
arXiv - CS - Computation and Language Pub Date : 2021-09-16 , DOI: arxiv-2109.07833
Hendrik Schuff, Hsiu-Yu Yang, Heike Adel, Ngoc Thang Vu

Natural language inference (NLI) requires models to learn and apply commonsense knowledge. These reasoning abilities are particularly important for explainable NLI systems that generate a natural language explanation in addition to their label prediction. The integration of external knowledge has been shown to improve NLI systems, here we investigate whether it can also improve their explanation capabilities. For this, we investigate different sources of external knowledge and evaluate the performance of our models on in-domain data as well as on special transfer datasets that are designed to assess fine-grained reasoning capabilities. We find that different sources of knowledge have a different effect on reasoning abilities, for example, implicit knowledge stored in language models can hinder reasoning on numbers and negations. Finally, we conduct the largest and most fine-grained explainable NLI crowdsourcing study to date. It reveals that even large differences in automatic performance scores do neither reflect in human ratings of label, explanation, commonsense nor grammar correctness.

中文翻译:

外部知识是否有助于可解释的自然语言推理?自动评估与人工评分

自然语言推理 (NLI) 需要模型来学习和应用常识知识。这些推理能力对于可解释的 NLI 系统尤其重要,该系统除了标签预测外还生成自然语言解释。外部知识的整合已被证明可以改进 NLI 系统,在这里我们研究它是否也可以提高它们的解释能力。为此,我们调查了不同的外部知识来源,并评估了我们的模型在域内数据以及旨在评估细粒度推理能力的特殊传输数据集上的性能。我们发现不同的知识来源对推理能力有不同的影响,例如,存储在语言模型中的隐性知识会阻碍对数字和否定的推理。最后,我们进行了迄今为止规模最大、粒度最细的可解释 NLI 众包研究。它表明,即使自动表现分数的巨大差异也不会反映在标签、解释、常识或语法正确性的人类评分中。
更新日期:2021-09-17
down
wechat
bug