Generating Adversarial Inputs Using A Black-box Differential Technique,arXiv - CS - Cryptography and Security

当前位置： X-MOL 学术 › arXiv.cs.CR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Generating Adversarial Inputs Using A Black-box Differential Technique
arXiv - CS - Cryptography and Security Pub Date : 2020-07-10 , DOI: arxiv-2007.05315
Jo\~ao Batista Pereira Matos Ju\'unior, Lucas Carvalho Cordeiro, Marcelo d'Amorim, Xiaowei Huang

Neural Networks (NNs) are known to be vulnerable to adversarial attacks. A malicious agent initiates these attacks by perturbing an input into another one such that the two inputs are classified differently by the NN. In this paper, we consider a special class of adversarial examples, which can exhibit not only the weakness of NN models - as do for the typical adversarial examples - but also the different behavior between two NN models. We call them difference-inducing adversarial examples or DIAEs. Specifically, we propose DAEGEN, the first black-box differential technique for adversarial input generation. DAEGEN takes as input two NN models of the same classification problem and reports on output an adversarial example. The obtained adversarial example is a DIAE, so that it represents a point-wise difference in the input space between the two NN models. Algorithmically, DAEGEN uses a local search-based optimization algorithm to find DIAEs by iteratively perturbing an input to maximize the difference of two models on predicting the input. We conduct experiments on a spectrum of benchmark datasets (e.g., MNIST, ImageNet, and Driving) and NN models (e.g., LeNet, ResNet, Dave, and VGG). Experimental results are promising. First, we compare DAEGEN with two existing white-box differential techniques (DeepXplore and DLFuzz) and find that under the same setting, DAEGEN is 1) effective, i.e., it is the only technique that succeeds in generating attacks in all cases, 2) precise, i.e., the adversarial attacks are very likely to fool machines and humans, and 3) efficient, i.e, it requires a reasonable number of classification queries. Second, we compare DAEGEN with state-of-the-art black-box adversarial attack methods (simba and tremba), by adapting them to work on a differential setting. The experimental results show that DAEGEN performs better than both of them.

中文翻译：

使用黑盒差分技术生成对抗性输入

众所周知，神经网络 (NN) 容易受到对抗性攻击。恶意代理通过将输入扰乱到另一个输入来发起这些攻击，这样两个输入被 NN 分类为不同的类别。在本文中，我们考虑一类特殊的对抗样本，它不仅可以展示 NN 模型的弱点——就像典型的对抗样本一样——而且可以展示两个 NN 模型之间的不同行为。我们称它们为差异诱导对抗样本或 DIAE。具体来说，我们提出了 DAEGEN，这是第一个用于生成对抗性输入的黑盒差分技术。DAEGEN 将相同分类问题的两个 NN 模型作为输入，并报告输出一个对抗性示例。得到的对抗样本是一个 DIAE，因此它代表了两个 NN 模型之间输入空间的逐点差异。在算法上，DAEGEN 使用基于局部搜索的优化算法通过迭代扰动输入以最大化两个模型在预测输入时的差异来找到 DIAE。我们对一系列基准数据集（例如 MNIST、ImageNet 和 Driving）和 NN 模型（例如 LeNet、ResNet、Dave 和 VGG）进行了实验。实验结果很有希望。首先，我们将 DAEGEN 与现有的两种白盒差分技术（DeepXplore 和 DLFuzz）进行比较，发现在相同设置下，DAEGEN 是 1) 有效，即它是唯一一种在所有情况下都能成功生成攻击的技术，2)精确，即对抗性攻击很可能愚弄机器和人类，3）高效，即它需要合理数量的分类查询。其次，我们将 DAEGEN 与最先进的黑盒对抗攻击方法（simba 和 tremba）进行比较，使它们适应不同的设置。实验结果表明，DAEGEN 的性能优于两者。

更新日期：2020-07-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>