Adversarial attacks and defenses in explainable artificial intelligence: A survey,Information Fusion

当前位置： X-MOL 学术 › Inform. Fusion › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Adversarial attacks and defenses in explainable artificial intelligence: A survey
Information Fusion ( IF 14.7 ) Pub Date : 2024-02-19 , DOI: 10.1016/j.inffus.2024.102303
Hubert Baniecki , Przemyslaw Biecek

Explainable artificial intelligence (XAI) methods are portrayed as a remedy for debugging and trusting statistical and deep learning models, as well as interpreting their predictions. However, recent advances in adversarial machine learning (AdvML) highlight the limitations and vulnerabilities of state-of-the-art explanation methods, putting their security and trustworthiness into question. The possibility of manipulating, fooling or fairwashing evidence of the model’s reasoning has detrimental consequences when applied in high-stakes decision-making and knowledge discovery. This survey provides a comprehensive overview of research concerning adversarial attacks on explanations of machine learning models, as well as fairness metrics. We introduce a unified notation and taxonomy of methods facilitating a common ground for researchers and practitioners from the intersecting research fields of AdvML and XAI. We discuss how to defend against attacks and design robust interpretation methods. We contribute a list of existing insecurities in XAI and outline the emerging research directions in adversarial XAI (AdvXAI). Future work should address improving explanation methods and evaluation protocols to take into account the reported safety issues.

中文翻译：

可解释人工智能中的对抗性攻击和防御：一项调查

可解释的人工智能（XAI）方法被描述为调试和信任统计和深度学习模型以及解释其预测的补救措施。然而，对抗性机器学习（AdvML）的最新进展凸显了最先进的解释方法的局限性和漏洞，使其安全性和可信度受到质疑。当模型应用于高风险决策和知识发现时，操纵、欺骗或公平清洗模型推理证据的可能性会产生有害后果。这项调查全面概述了有关机器学习模型解释以及公平性指标的对抗性攻击的研究。我们引入了统一的方法符号和分类法，促进 AdvML 和 XAI 交叉研究领域的研究人员和实践者建立共同基础。我们讨论如何防御攻击并设计稳健的解释方法。我们提供了 XAI 中现有不安全因素的列表，并概述了对抗性 XAI (AdvXAI) 的新兴研究方向。未来的工作应改进解释方法和评估方案，以考虑报告的安全问题。

更新日期：2024-02-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11