Where is the Model Looking At --Concentrate and Explain the Network Attention,IEEE Journal of Selected Topics in Signal Processing

当前位置： X-MOL 学术 › IEEE J. Sel. Top. Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Where is the Model Looking At --Concentrate and Explain the Network Attention
IEEE Journal of Selected Topics in Signal Processing ( IF 8.7 ) Pub Date : 2020-03-01 , DOI: 10.1109/jstsp.2020.2987729
Wenjia Xu , Jiuniu Wang , Yang Wang , Guangluan Xu , Daoyu Lin , Wei Dai , Yirong Wu

Image classification models have achieved satisfactory performance on many datasets, sometimes even better than humans. However, the model attention is unclear since the lack of interpretability. This paper investigates the fidelity and interpretability of model attention. We propose an Explainable Attribute-based Multi-task (EAT) framework to concentrate the model attention on the discriminative image area and make the attention interpretable. We introduce attributes prediction to the multi-task learning network, helping the network to concentrate attention on the foreground objects. We generate attribute-based textual explanations for the network and ground the attributes on the image to show visual explanations. The multi-modal explanation can not only improve user trust but also help to find the weakness of the network and dataset. Our framework can be generalized to any basic model. We perform experiments on three datasets and five basic models. Results indicate that the EAT framework can give multi-modal explanations that interpret the network decision. The performance of several recognition approaches is improved by guiding network attention.

中文翻译：

模型在看哪里——集中并解释网络注意力

图像分类模型在许多数据集上都取得了令人满意的表现，有时甚至比人类还要好。然而，由于缺乏可解释性，模型注意力不明确。本文研究了模型注意力的保真度和可解释性。我们提出了一个基于可解释属性的多任务 (EAT) 框架，将模型注意力集中在判别图像区域上，并使注意力可解释。我们将属性预测引入多任务学习网络，帮助网络将注意力集中在前景对象上。我们为网络生成基于属性的文本解释，并将属性放在图像上以显示视觉解释。多模态解释不仅可以提高用户信任度，还有助于发现网络和数据集的弱点。我们的框架可以推广到任何基本模型。我们对三个数据集和五个基本模型进行了实验。结果表明 EAT 框架可以给出解释网络决策的多模态解释。通过引导网络注意力，提高了几种识别方法的性能。

更新日期：2020-03-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11