What Do You See? Evaluation of Explainable Artificial Intelligence (XAI) Interpretability through Neural Backdoors,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

What Do You See? Evaluation of Explainable Artificial Intelligence (XAI) Interpretability through Neural Backdoors
arXiv - CS - Machine Learning Pub Date : 2020-09-22 , DOI: arxiv-2009.10639
Yi-Shan Lin, Wen-Chuan Lee, Z. Berkay Celik

EXplainable AI (XAI) methods have been proposed to interpret how a deep neural network predicts inputs through model saliency explanations that highlight the parts of the inputs deemed important to arrive a decision at a specific target. However, it remains challenging to quantify correctness of their interpretability as current evaluation approaches either require subjective input from humans or incur high computation cost with automated evaluation. In this paper, we propose backdoor trigger patterns--hidden malicious functionalities that cause misclassification--to automate the evaluation of saliency explanations. Our key observation is that triggers provide ground truth for inputs to evaluate whether the regions identified by an XAI method are truly relevant to its output. Since backdoor triggers are the most important features that cause deliberate misclassification, a robust XAI method should reveal their presence at inference time. We introduce three complementary metrics for systematic evaluation of explanations that an XAI method generates and evaluate seven state-of-the-art model-free and model-specific posthoc methods through 36 models trojaned with specifically crafted triggers using color, shape, texture, location, and size. We discovered six methods that use local explanation and feature relevance fail to completely highlight trigger regions, and only a model-free approach can uncover the entire trigger region.

中文翻译：

你看到了什么？通过神经后门评估可解释人工智能 (XAI) 的可解释性

已经提出了可解释的 AI (XAI) 方法来解释深度神经网络如何通过模型显着性解释来预测输入，这些解释突出了输入中被认为对特定目标做出决定很重要的部分。然而，量化其可解释性的正确性仍然具有挑战性，因为当前的评估方法要么需要人类的主观输入，要么通过自动评估产生高计算成本。在本文中，我们提出了后门触发模式——导致错误分类的隐藏恶意功能——以自动评估显着性解释。我们的主要观察结果是触发器为输入提供了基本事实，以评估 XAI 方法识别的区域是否与其输出真正相关。由于后门触发器是导致故意错误分类的最重要的特征，一个强大的 XAI 方法应该在推理时揭示它们的存在。我们引入了三个互补指标，用于系统评估 XAI 方法生成和评估七种最先进的无模型和特定于模型的 posthoc 方法，通过使用颜色、形状、纹理、位置特制触发器的 36 个模型进行评估, 和大小。我们发现六种使用局部解释和特征相关性的方法无法完全突出触发区域，只有无模型方法才能揭示整个触发区域。我们引入了三个互补指标，用于系统评估 XAI 方法生成和评估七种最先进的无模型和特定于模型的 posthoc 方法，通过使用颜色、形状、纹理、位置特制触发器的 36 个模型进行评估, 和大小。我们发现六种使用局部解释和特征相关性的方法无法完全突出触发区域，只有无模型方法才能揭示整个触发区域。我们引入了三个互补指标，用于系统评估 XAI 方法生成和评估七种最先进的无模型和特定于模型的 posthoc 方法，通过使用颜色、形状、纹理、位置特制触发器的 36 个模型进行评估, 和大小。我们发现六种使用局部解释和特征相关性的方法无法完全突出触发区域，只有无模型方法才能揭示整个触发区域。

更新日期：2020-09-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>