当前位置:
X-MOL 学术
›
arXiv.cs.LG
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Heat and Blur: An Effective and Fast Defense Against Adversarial Examples
arXiv - CS - Machine Learning Pub Date : 2020-03-17 , DOI: arxiv-2003.07573 Haya Brama and Tal Grinshpoun
arXiv - CS - Machine Learning Pub Date : 2020-03-17 , DOI: arxiv-2003.07573 Haya Brama and Tal Grinshpoun
The growing incorporation of artificial neural networks (NNs) into many
fields, and especially into life-critical systems, is restrained by their
vulnerability to adversarial examples (AEs). Some existing defense methods can
increase NNs' robustness, but they often require special architecture or
training procedures and are irrelevant to already trained models. In this
paper, we propose a simple defense that combines feature visualization with
input modification, and can, therefore, be applicable to various pre-trained
networks. By reviewing several interpretability methods, we gain new insights
regarding the influence of AEs on NNs' computation. Based on that, we
hypothesize that information about the "true" object is preserved within the
NN's activity, even when the input is adversarial, and present a feature
visualization version that can extract that information in the form of
relevance heatmaps. We then use these heatmaps as a basis for our defense, in
which the adversarial effects are corrupted by massive blurring. We also
provide a new evaluation metric that can capture the effects of both attacks
and defenses more thoroughly and descriptively, and demonstrate the
effectiveness of the defense and the utility of the suggested evaluation
measurement with VGG19 results on the ImageNet dataset.
中文翻译:
热度和模糊:针对对抗性示例的有效且快速的防御
人工神经网络 (NN) 越来越多地融入许多领域,尤其是生命关键系统,这受到它们对对抗性示例 (AE) 的脆弱性的限制。一些现有的防御方法可以提高神经网络的鲁棒性,但它们通常需要特殊的架构或训练程序,并且与已经训练的模型无关。在本文中,我们提出了一种将特征可视化与输入修改相结合的简单防御,因此可以适用于各种预训练网络。通过回顾几种可解释性方法,我们获得了关于 AE 对 NN 计算的影响的新见解。基于此,我们假设关于“真实”对象的信息会保留在 NN 的活动中,即使输入是对抗性的,并提供一个特征可视化版本,可以以相关热图的形式提取该信息。然后我们使用这些热图作为我们防御的基础,其中对抗性效果被大量模糊破坏。我们还提供了一种新的评估指标,可以更彻底、更描述性地捕捉攻击和防御的影响,并使用 ImageNet 数据集上的 VGG19 结果证明防御的有效性和建议评估测量的效用。
更新日期:2020-03-18
中文翻译:
热度和模糊:针对对抗性示例的有效且快速的防御
人工神经网络 (NN) 越来越多地融入许多领域,尤其是生命关键系统,这受到它们对对抗性示例 (AE) 的脆弱性的限制。一些现有的防御方法可以提高神经网络的鲁棒性,但它们通常需要特殊的架构或训练程序,并且与已经训练的模型无关。在本文中,我们提出了一种将特征可视化与输入修改相结合的简单防御,因此可以适用于各种预训练网络。通过回顾几种可解释性方法,我们获得了关于 AE 对 NN 计算的影响的新见解。基于此,我们假设关于“真实”对象的信息会保留在 NN 的活动中,即使输入是对抗性的,并提供一个特征可视化版本,可以以相关热图的形式提取该信息。然后我们使用这些热图作为我们防御的基础,其中对抗性效果被大量模糊破坏。我们还提供了一种新的评估指标,可以更彻底、更描述性地捕捉攻击和防御的影响,并使用 ImageNet 数据集上的 VGG19 结果证明防御的有效性和建议评估测量的效用。