当前位置: X-MOL 学术Comput. Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reverse engineering imperceptible backdoor attacks on deep neural networks for detection and training set cleansing
Computers & Security ( IF 5.6 ) Pub Date : 2021-04-22 , DOI: 10.1016/j.cose.2021.102280
Zhen Xiang , David J. Miller , George Kesidis

Backdoor data poisoning (a.k.a. Trojan attack) is an emerging form of adversarial attack usually against deep neural network image classifiers. The attacker poisons the training set with a relatively small set of images from one (or several) source class(es), embedded with a backdoor pattern and labeled to a target class. For a successful attack, during operation, the trained classifier will: 1) misclassify a test image from the source class(es) to the target class whenever the backdoor pattern is present; 2) maintain high classification accuracy for backdoor-free test images. In this paper, we make a breakthrough in defending backdoor attacks with imperceptible backdoor patterns (e.g. watermarks) before/during the classifier training phase. This is a challenging problem because it is a priori unknown which subset (if any) of the training set has been poisoned. We propose an optimization-based reverse engineering defense that jointly: 1) detects whether the training set is poisoned; 2) if so, accurately identifies the target class and the training images with the backdoor pattern embedded; and 3) additionally, reverse engineers an estimate of the backdoor pattern used by the attacker. In benchmark experiments on CIFAR-10 (as well as four other data sets), considering a variety of attacks, our defense achieves a new state-of-the-art by reducing the attack success rate to no more than 4.9% after removing detected suspicious training images.



中文翻译:

对深度神经网络进行逆向工程察觉不到的后门攻击,以进行检测和训练集清洗

后门数据中毒(又称特洛伊木马攻击)是通常针对深度神经网络图像分类器的对抗攻击的一种新兴形式。攻击者使用来自(一个或多个)源类的相对较小的图像集(嵌入后门模式并标记为目标类)来毒害训练集。为了成功进行攻击,在操作过程中,训练有素的分类器将:1)只要存在后门模式,就将测试图像从源类错误地分类到目标类中;2)保持无后门测试图像的高分类精度。在本文中,我们在分类器训练阶段之前/期间难以察觉的后门模式(例如水印)在防御后门攻击方面取得了突破。这是一个具有挑战性的问题,因为先验未知训练集的哪个子集(如果有)已经中毒。我们提出了一种基于优化的逆向工程防御,该防御可以共同进行:1)检测训练集是否中毒;2)如果是,则准确地识别目标类别和嵌入了后门图案的训练图像;3)此外,逆向工程师对攻击者使用的后门模式进行了估算。在CIFAR-10(以及其他四个数据集)的基准实验中,考虑到各种攻击,我们的防御通过将检测到的攻击删除后将攻击成功率降低至不超过4.9%,从而达到了最新的技术水平可疑训练图像。

更新日期:2021-05-03
down
wechat
bug