Detection defense against adversarial attacks with saliency map,International Journal of Intelligent Systems

当前位置： X-MOL 学术 › Int. J. Intell. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Detection defense against adversarial attacks with saliency map
International Journal of Intelligent Systems ( IF 7 ) Pub Date : 2021-05-08 , DOI: 10.1002/int.22458
Dengpan Ye ₁ , Chuanxi Chen ₁ , Changrui Liu ₁ , Hao Wang ₁ , Shunzhi Jiang ₁

Affiliation

It is well established that neural networks are vulnerable to adversarial examples, which are almost imperceptible on human vision and can cause the deep models misbehave. Such phenomenon may lead to severely inestimable consequences in the safety and security critical applications. Existing defenses are trend to harden the robustness of models against adversarial attacks, for example, adversarial training technology. However, these are usually intractable to implement due to the high cost of retraining and the cumbersome operations of altering the model architecture or parameters. In this paper, we discuss the saliency map method from the view of enhancing model interpretability, it is similar to introducing the mechanism of the attention to the model, so as to comprehend the progress of object identification by the deep networks. We then propose a novel method combined with additional noises and utilize the inconsistency strategy to detect adversarial examples. Our experimental results of some representative adversarial attacks on common data sets including ImageNet and popular models show that our method can detect all the attacks with high detection success rate effectively. We compare it with the existing state-of-the-art technique, and the experiments indicate that our method is more general.

中文翻译：

使用显着图检测防御对抗性攻击

众所周知，神经网络容易受到对抗性示例的影响，这些示例在人类视觉上几乎无法察觉，并可能导致深层模型行为不端。这种现象可能会在安全和安保关键应用程序中导致严重不可估量的后果。现有的防御措施倾向于加强模型对对抗性攻击的鲁棒性，例如对抗性训练技术。然而，由于再训练的高成本以及更改模型架构或参数的繁琐操作，这些通常难以实施。在本文中，我们从增强模型可解释性的角度讨论显着图方法，它类似于将注意力机制引入模型，以理解深度网络在对象识别方面的进展。然后，我们提出了一种结合额外噪声的新方法，并利用不一致策略来检测对抗样本。我们对包括 ImageNet 和流行模型在内的常见数据集进行的一些具有代表性的对抗性攻击的实验结果表明，我们的方法可以有效地检测所有具有高检测成功率的攻击。我们将其与现有的最先进技术进行比较，实验表明我们的方法更通用。

更新日期：2021-05-08

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>