Do Natural Language Explanations Represent Valid Logical Arguments? Verifying Entailment in Explainable NLI Gold Standards,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Do Natural Language Explanations Represent Valid Logical Arguments? Verifying Entailment in Explainable NLI Gold Standards
arXiv - CS - Artificial Intelligence Pub Date : 2021-05-05 , DOI: arxiv-2105.01974
Marco Valentino, Ian Pratt-Hartman, André Freitas

An emerging line of research in Explainable NLP is the creation of datasets enriched with human-annotated explanations and rationales, used to build and evaluate models with step-wise inference and explanation generation capabilities. While human-annotated explanations are used as ground-truth for the inference, there is a lack of systematic assessment of their consistency and rigour. In an attempt to provide a critical quality assessment of Explanation Gold Standards (XGSs) for NLI, we propose a systematic annotation methodology, named Explanation Entailment Verification (EEV), to quantify the logical validity of human-annotated explanations. The application of EEV on three mainstream datasets reveals the surprising conclusion that a majority of the explanations, while appearing coherent on the surface, represent logically invalid arguments, ranging from being incomplete to containing clearly identifiable logical errors. This conclusion confirms that the inferential properties of explanations are still poorly formalised and understood, and that additional work on this line of research is necessary to improve the way Explanation Gold Standards are constructed.

中文翻译：

自然语言说明是否表示有效的逻辑参数？验证可解释的NLI黄金标准的适用性

可解释的自然语言处理中的一项新兴研究是创建带有人工注释的解释和基本原理的数据集，该数据集用于通过逐步推理和解释生成功能来构建和评估模型。尽管使用人工注释的解释作为推理的基础，但仍缺乏对其一致性和严谨性的系统评估。为了提供NLI解释黄金标准（XGS）的关键质量评估，我们提出了一种系统的注释方法，称为“解释蕴含验证（EEV）”，以量化人类注释解释的逻辑有效性。EEV在三个主流数据集上的应用揭示了令人惊讶的结论，即大多数解释虽然表面上看起来连贯，但在逻辑上却是无效的论点，从不完整到包含明显可识别的逻辑错误。该结论证实，解释的推论性质仍然缺乏正规化和理解，并且在该研究领域的额外工作对于改进解释黄金标准的构建方式是必要的。

更新日期：2021-05-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文