当前位置: X-MOL 学术Inform. Fusion › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Finding and removing Clever Hans: Using explanation methods to debug and improve deep models
Information Fusion ( IF 14.7 ) Pub Date : 2021-08-03 , DOI: 10.1016/j.inffus.2021.07.015
Christopher J. Anders 1, 2 , Leander Weber 3, 4 , David Neumann 3 , Wojciech Samek 2, 3 , Klaus-Robert Müller 1, 2, 5, 6 , Sebastian Lapuschkin 3
Affiliation  

Contemporary learning models for computer vision are typically trained on very large (benchmark) datasets with millions of samples. These may, however, contain biases, artifacts, or errors that have gone unnoticed and are exploitable by the model. In the worst case, the trained model does not learn a valid and generalizable strategy to solve the problem it was trained for, and becomes a “Clever-Hans” predictor that bases its decisions on spurious correlations in the training data, potentially yielding an unrepresentative or unfair, and possibly even hazardous predictor. In this paper, we contribute by providing a comprehensive analysis framework based on a scalable statistical analysis of attributions from explanation methods for large data corpora. Based on a recent technique – Spectral Relevance Analysis – we propose the following technical contributions and resulting findings: (a) a scalable quantification of artifactual and poisoned classes where the machine learning models under study exhibit Clever-Hans behavior, (b) several approaches we collectively denote as Class Artifact Compensation, which are able to effectively and significantly reduce a model’s Clever Hans behavior, i.e., we are able to un-Hans models trained on (poisoned) datasets, such as the popular ImageNet data corpus. We demonstrate that Class Artifact Compensation, defined in a simple theoretical framework, may be implemented as part of a Neural Network’s training or fine-tuning process, or in a post-hoc manner by injecting additional layers, preventing any further propagation of undesired Clever Hans features, into the network architecture. Using our proposed methods, we provide qualitative and quantitative analyses of the biases and artifacts in, e.g., the ImageNet dataset, the Adience benchmark dataset of unfiltered faces and the ISIC 2019 skin lesion analysis dataset. We demonstrate that these insights can give rise to improved, more representative and fairer models operating on implicitly cleaned data corpora.



中文翻译:

寻找和去除 Clever Hans:使用解释方法调试和改进深度模型

用于计算机视觉的当代学习模型通常在具有数百万个样本的非常大(基准)数据集上进行训练。然而,这些可能包含偏差、伪影或错误,这些偏差、伪影或错误可能未被注意到并且可被模型利用。在最坏的情况下,受过训练的模型不会学习有效且可概括的策略来解决它所训练的问题,而是成为“Clever-Hans”预测器,根据训练数据中的虚假相关性做出决策,可能会产生不具有代表性的或不公平的,甚至可能是危险的预测器。在本文中,我们通过提供一个综合分析框架做出贡献,该框架基于对大数据语料库解释方法的归因的可扩展统计分析。在(中毒)数据集上训练的un-Hans模型,例如流行的 ImageNet 数据语料库。我们证明了在一个简单的理论框架中定义的类工件补偿可以作为神经网络训练或微调过程的一部分来实现,或者通过注入额外的层以事后的方式实现,防止任何不受欢迎的 Clever Hans 的进一步传播功能,进入网络架构。使用我们提出的方法,我们对 ImageNet 数据集、未过滤人脸的 Adience 基准数据集和 ISIC 2019 皮肤病变分析数据集中的偏差和伪影进行定性和定量分析。我们证明,这些见解可以产生在隐式清理数据语料库上运行的改进的、更具代表性和更公平的模型。

更新日期:2021-08-03
down
wechat
bug