Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach
arXiv - CS - Artificial Intelligence Pub Date : 2020-01-21 , DOI: arxiv-2001.07417
Carlos Fern\'andez-Lor\'ia, Foster Provost, Xintian Han

Lack of understanding of the decisions made by model-based AI systems is an important barrier for their adoption. We examine counterfactual explanations as an alternative for explaining AI decisions. The counterfactual approach defines an explanation as a set of the system's data inputs that causally drives the decision (meaning that removing them changes the decision) and is irreducible (meaning that removing any subset of the inputs in the explanation does not change the decision). We generalize previous work on counterfactual explanations, resulting in a framework that (a) is model-agnostic, (b) can address features with arbitrary data types, (c) can explain decisions made by complex AI systems that incorporate multiple models, and (d) is scalable to large numbers of features. We also propose a heuristic procedure to find the most useful explanations depending on the context. We contrast counterfactual explanations with another alternative: methods that explain model predictions by weighting features according to their importance (e.g., SHAP, LIME). This paper presents two fundamental reasons why explaining model predictions is not the same as explaining the decisions made using those predictions, suggesting we should carefully consider whether importance-weight explanations are well-suited to explain decisions made by AI systems. Specifically, we show that (1) features that have a large importance weight for a model prediction may not actually affect the corresponding decision, and (2) importance weights are insufficient to communicate whether and how features influence system decisions. We demonstrate this with several examples, including three detailed case studies that compare the counterfactual approach with SHAP to illustrate various conditions under which counterfactual explanations explain data-driven decisions better than feature importance weights.

中文翻译：

解释人工智能系统做出的数据驱动决策：反事实方法

缺乏对基于模型的 AI 系统所做决策的理解是其采用的一个重要障碍。我们将反事实解释作为解释 AI 决策的替代方法。反事实方法将解释定义为一组系统的数据输入，这些数据输入因果驱动决策（意味着删除它们会改变决策）并且是不可约的（意味着删除解释中输入的任何子集不会改变决策）。我们概括了之前在反事实解释方面的工作，得到了一个框架：（a）与模型无关，（b）可以处理具有任意数据类型的特征，（c）可以解释由包含多个模型的复杂人工智能系统做出的决策，以及（ d) 可扩展到大量特征。我们还提出了一种启发式程序，以根据上下文找到最有用的解释。我们将反事实解释与另一种替代方法进行对比：通过根据特征的重要性对特征进行加权来解释模型预测的方法（例如，SHAP、LIME）。本文提出了解释模型预测与解释使用这些预测做出的决策不同的两个根本原因，建议我们应该仔细考虑重要性权重解释是否非常适合解释人工智能系统做出的决策。具体来说，我们表明（1）对模型预测具有较大重要性权重的特征实际上可能不会影响相应的决策，以及（2）重要性权重不足以传达特征是否以及如何影响系统决策。

更新日期：2020-05-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文