当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Heat and Blur: An Effective and Fast Defense Against Adversarial Examples
arXiv - CS - Machine Learning Pub Date : 2020-03-17 , DOI: arxiv-2003.07573
Haya Brama and Tal Grinshpoun

The growing incorporation of artificial neural networks (NNs) into many fields, and especially into life-critical systems, is restrained by their vulnerability to adversarial examples (AEs). Some existing defense methods can increase NNs' robustness, but they often require special architecture or training procedures and are irrelevant to already trained models. In this paper, we propose a simple defense that combines feature visualization with input modification, and can, therefore, be applicable to various pre-trained networks. By reviewing several interpretability methods, we gain new insights regarding the influence of AEs on NNs' computation. Based on that, we hypothesize that information about the "true" object is preserved within the NN's activity, even when the input is adversarial, and present a feature visualization version that can extract that information in the form of relevance heatmaps. We then use these heatmaps as a basis for our defense, in which the adversarial effects are corrupted by massive blurring. We also provide a new evaluation metric that can capture the effects of both attacks and defenses more thoroughly and descriptively, and demonstrate the effectiveness of the defense and the utility of the suggested evaluation measurement with VGG19 results on the ImageNet dataset.

中文翻译:

热度和模糊:针对对抗性示例的有效且快速的防御

人工神经网络 (NN) 越来越多地融入许多领域,尤其是生命关键系统,这受到它们对对抗性示例 (AE) 的脆弱性的限制。一些现有的防御方法可以提高神经网络的鲁棒性,但它们通常需要特殊的架构或训练程序,并且与已经训练的模型无关。在本文中,我们提出了一种将特征可视化与输入修改相结合的简单防御,因此可以适用于各种预训练网络。通过回顾几种可​​解释性方法,我们获得了关于 AE 对 NN 计算的影响的新见解。基于此,我们假设关于“真实”对象的信息会保留在 NN 的活动中,即使输入是对抗性的,并提供一个特征可视化版本,可以以相关热图的形式提取该信息。然后我们使用这些热图作为我们防御的基础,其中对抗性效果被大量模糊破坏。我们还提供了一种新的评估指标,可以更彻底、更描述性地捕捉攻击和防御的影响,并使用 ImageNet 数据集上的 VGG19 结果证明防御的有效性和建议评估测量的效用。
更新日期:2020-03-18
down
wechat
bug