Detecting Adversarial Image Examples in Deep Neural Networks with Adaptive Noise Reduction,IEEE Transactions on Dependable and Secure Computing

当前位置： X-MOL 学术 › IEEE Trans. Dependable Secure Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Detecting Adversarial Image Examples in Deep Neural Networks with Adaptive Noise Reduction
IEEE Transactions on Dependable and Secure Computing ( IF 7.3 ) Pub Date : 2021-01-01 , DOI: 10.1109/tdsc.2018.2874243
Bin Liang , Hongcheng Li , Miaoqiang Su , Xirong Li , Wenchang Shi , Xiaofeng Wang

Recently, many studies have demonstrated deep neural network (DNN) classifiers can be fooled by the adversarial example, which is crafted via introducing some perturbations into an original sample. Accordingly, some powerful defense techniques were proposed. However, existing defense techniques often require modifying the target model or depend on the prior knowledge of attacks. In this paper, we propose a straightforward method for detecting adversarial image examples, which can be directly deployed into unmodified off-the-shelf DNN models. We consider the perturbation to images as a kind of noise and introduce two classic image processing techniques, scalar quantization and smoothing spatial filter, to reduce its effect. The image entropy is employed as a metric to implement an adaptive noise reduction for different kinds of images. Consequently, the adversarial example can be effectively detected by comparing the classification results of a given sample and its denoised version, without referring to any prior knowledge of attacks. More than 20,000 adversarial examples against some state-of-the-art DNN models are used to evaluate the proposed method, which are crafted with different attack techniques. The experiments show that our detection method can achieve a high overall F1 score of 96.39 percent and certainly raises the bar for defense-aware attacks.

中文翻译：

使用自适应降噪检测深度神经网络中的对抗性图像示例

最近，许多研究表明，深度神经网络 (DNN) 分类器可能会被对抗性示例所欺骗，对抗性示例是通过在原始样本中引入一些扰动来制作的。因此，提出了一些强大的防御技术。然而，现有的防御技术往往需要修改目标模型或依赖于攻击的先验知识。在本文中，我们提出了一种检测对抗性图像示例的简单方法，该方法可以直接部署到未经修改的现成 DNN 模型中。我们将图像的扰动视为一种噪声，并引入了两种经典的图像处理技术，标量量化和平滑空间滤波器，以减少其影响。图像熵被用作度量来实现对不同类型图像的自适应降噪。最后，通过比较给定样本的分类结果与其去噪版本的分类结果，可以有效地检测对抗样本，而无需参考任何攻击的先验知识。针对一些最先进的 DNN 模型的 20,000 多个对抗性示例用于评估所提出的方法，这些方法是用不同的攻击技术制作的。实验表明，我们的检测方法可以达到 96.39% 的高总体 F1 分数，并且无疑提高了防御感知攻击的标准。它们是用不同的攻击技术制作的。实验表明，我们的检测方法可以达到 96.39% 的高总体 F1 分数，并且无疑提高了防御感知攻击的标准。它们是用不同的攻击技术制作的。实验表明，我们的检测方法可以达到 96.39% 的高总体 F1 分数，并且无疑提高了防御感知攻击的标准。

更新日期：2021-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>