Rationalizing predictions by adversarial information calibration,Artificial Intelligence

当前位置： X-MOL 学术 › Artif. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Rationalizing predictions by adversarial information calibration
Artificial Intelligence ( IF 14.4 ) Pub Date : 2022-11-17 , DOI: 10.1016/j.artint.2022.103828
Lei Sha , Oana-Maria Camburu , Thomas Lukasiewicz

Explaining the predictions of AI models is paramount in safety-critical applications, such as in legal or medical domains. One form of explanation for a prediction is an extractive rationale, i.e., a subset of features of an instance that lead the model to give its prediction on that instance. For example, the subphrase “he stole the mobile phone” can be an extractive rationale for the prediction of “Theft”. Previous works on generating extractive rationales usually employ a two-phase model: a selector that selects the most important features (i.e., the rationale) followed by a predictor that makes the prediction based exclusively on the selected features. One disadvantage of these works is that the main signal for learning to select features comes from the comparison of the answers given by the predictor to the ground-truth answers. In this work, we propose to squeeze more information from the predictor via an information calibration method. More precisely, we train two models jointly: one is a typical neural model that solves the task at hand in an accurate but black-box manner, and the other is a selector-predictor model that additionally produces a rationale for its prediction. The first model is used as a guide for the second model. We use an adversarial technique to calibrate the information extracted by the two models such that the difference between them is an indicator of the missed or over-selected features. In addition, for natural language tasks, we propose a language-model-based regularizer to encourage the extraction of fluent rationales. Experimental results on a sentiment analysis task, a hate speech recognition task, as well as on three tasks from the legal domain show the effectiveness of our approach to rationale extraction.

中文翻译：

通过对抗性信息校准使预测合理化

解释 AI 模型的预测在安全关键型应用中至关重要，例如法律或医学领域。对预测的一种解释形式是提取原理，即导致模型对该实例做出预测的实例特征的子集。例如，子短语“he stole the mobile phone”可以作为预测“Theft”的提取理由。以前关于生成提取原理的工作通常采用两阶段模型：一个选择器选择最重要的特征（即原理），然后是一个预测器，该预测器仅基于所选特征进行预测。这些工作的一个缺点是学习选择特征的主要信号来自预测器给出的答案与地面真实答案的比较。在这项工作中，我们建议通过信息校准方法从预测器中挤出更多信息。更准确地说，我们联合训练了两个模型：一个是典型的神经模型，它以准确但黑盒的方式解决了手头的任务，另一个是选择器-预测器模型，它另外产生了预测的基本原理。第一个模型用作第二个模型的指南。我们使用对抗技术来校准两个模型提取的信息，以便它们之间的差异可以指示遗漏或过度选择的特征。此外，对于自然语言任务，我们提出了一种基于语言模型的正则化器来鼓励提取流畅的基本原理。情感分析任务、仇恨语音识别任务的实验结果，

更新日期：2022-11-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>