当前位置: X-MOL 学术arXiv.cs.CR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detection of Iterative Adversarial Attacks via Counter Attack
arXiv - CS - Cryptography and Security Pub Date : 2020-09-23 , DOI: arxiv-2009.11397
Matthias Rottmann, Mathis Peyron, Natasa Krejic and Hanno Gottschalk

Deep neural networks (DNNs) have proven to be powerful tools for processing unstructured data. However for high-dimensional data, like images, they are inherently vulnerable to adversarial attacks. Small almost invisible perturbations added to the input can be used to fool DNNs. Various attacks, hardening methods and detection methods have been introduced in recent years. Notoriously, Carlini-Wagner (CW) type attacks computed by iterative minimization belong to those that are most difficult to detect. In this work, we demonstrate that such iterative minimization attacks can by used as detectors themselves. Thus, in some sense we show that one can fight fire with fire. This work also outlines a mathematical proof that under certain assumptions this detector provides asymptotically optimal separation of original and attacked images. In numerical experiments, we obtain AUROC values up to 99.73% for our detection method. This distinctly surpasses state of the art detection rates for CW attacks from the literature. We also give numerical evidence that our method is robust against the attacker's choice of the method of attack.

中文翻译:

通过 Counter Attack 检测迭代对抗性攻击

深度神经网络 (DNN) 已被证明是处理非结构化数据的强大工具。然而,对于像图像这样的高维数据,它们天生就容易受到对抗性攻击。添加到输入中的几乎不可见的小扰动可用于欺骗 DNN。近年来引入了各种攻击、加固方法和检测方法。众所周知,通过迭代最小化计算的 Carlini-Wagner (CW) 类型攻击属于最难检测的攻击。在这项工作中,我们证明了这种迭代最小化攻击本身可以用作检测器。因此,从某种意义上说,我们表明可以用火来灭火。这项工作还概述了一个数学证明,即在某些假设下,该检测器提供原始图像和被攻击图像的渐近最佳分离。在数值实验中,我们的检测方法获得了高达 99.73% 的 AUROC 值。这明显超过了文献中最先进的 CW 攻击检测率。我们还给出了数值证据,证明我们的方法对于攻击者选择的攻击方法是鲁棒的。
更新日期:2020-09-25
down
wechat
bug