当前位置: X-MOL 学术ACM Comput. Surv. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI
ACM Computing Surveys ( IF 16.6 ) Pub Date : 2023-02-24 , DOI: 10.1145/3583558
Meike Nauta 1 , Jan Trienes 2 , Shreyasi Pathak 1 , Elisa Nguyen , Michelle Peters 3 , Yasmin Schmitt , Jörg Schlötterer 2 , Maurice van Keulen 3 , Christin Seifert 2
Affiliation  

The rising popularity of explainable artificial intelligence (XAI) to understand high-performing black boxes raised the question of how to evaluate explanations of machine learning (ML) models. While interpretability and explainability are often presented as a subjectively validated binary property, we consider it a multi-faceted concept. We identify 12 conceptual properties, such as Compactness and Correctness, that should be evaluated for comprehensively assessing the quality of an explanation. Our so-called Co-12 properties serve as categorization scheme for systematically reviewing the evaluation practices of more than 300 papers published in the last 7 years at major AI and ML conferences that introduce an XAI method. We find that 1 in 3 papers evaluate exclusively with anecdotal evidence, and 1 in 5 papers evaluate with users. This survey also contributes to the call for objective, quantifiable evaluation methods by presenting an extensive overview of quantitative XAI evaluation methods. Our systematic collection of evaluation methods provides researchers and practitioners with concrete tools to thoroughly validate, benchmark and compare new and existing XAI methods. The Co-12 categorization scheme and our identified evaluation methods open up opportunities to include quantitative metrics as optimization criteria during model training in order to optimize for accuracy and interpretability simultaneously.



中文翻译:

从轶事证据到定量评估方法:对评估可解释人工智能的系统评价

用于理解高性能黑盒的可解释人工智能 (XAI) 的日益普及引发了如何评估机器学习 (ML) 模型解释的问题。虽然可解释性和可解释性通常表现为主观验证的二元属性,但我们认为它是一个多方面的概念。我们确定了 12 个概念属性,例如紧凑性和正确性,应该对其进行评估以全面评估解释的质量。我们所谓的 Co-12 属性用作分类方案,用于系统地回顾过去 7 年在主要 AI 和 ML 会议上发表的 300 多篇论文的评估​​实践,这些论文介绍了 XAI 方法。我们发现,三分之一的论文仅根据轶事证据进行评估,五分之一的论文根据用户进行评估。该调查还通过广泛概述定量 XAI 评估方法,促进了对客观、可量化评估方法的呼吁。我们系统的评估方法集合为研究人员和从业者提供了具体的工具来彻底验证、基准测试和比较新的和现有的 XAI 方法。Co-12 分类方案和我们确定的评估方法提供了在模型训练期间将定量指标作为优化标准的机会,以便同时优化准确性和可解释性。我们系统的评估方法集合为研究人员和从业者提供了具体的工具来彻底验证、基准测试和比较新的和现有的 XAI 方法。Co-12 分类方案和我们确定的评估方法提供了在模型训练期间将定量指标作为优化标准的机会,以便同时优化准确性和可解释性。我们系统的评估方法集合为研究人员和从业者提供了具体的工具来彻底验证、基准测试和比较新的和现有的 XAI 方法。Co-12 分类方案和我们确定的评估方法提供了在模型训练期间将定量指标作为优化标准的机会,以便同时优化准确性和可解释性。

更新日期:2023-02-24
down
wechat
bug