Detecting Scene-Plausible Perceptible Backdoors in Trained DNNs without Access to the Training Set,Neural Computation

当前位置： X-MOL 学术 › Neural Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Detecting Scene-Plausible Perceptible Backdoors in Trained DNNs without Access to the Training Set
Neural Computation ( IF 2.7 ) Pub Date : 2021-02-23 , DOI: 10.1162/neco_a_01376
Zhen Xiang ₁ , David J Miller ₁ , Hang Wang ₁ , George Kesidis ₁

Affiliation

Backdoor data poisoning attacks add mislabeled examples to the training set, with an embedded backdoor pattern, so that the classifier learns to classify to a target class whenever the backdoor pattern is present in a test sample. Here, we address posttraining detection of scene-plausible perceptible backdoors, a type of backdoor attack that can be relatively easily fashioned, particularly against DNN image classifiers. A posttraining defender does not have access to the potentially poisoned training set, only to the trained classifier, as well as some unpoisoned examples that need not be training samples. Without the poisoned training set, the only information about a backdoor pattern is encoded in the DNN's trained weights. This detection scenario is of great import considering legacy and proprietary systems, cell phone apps, as well as training outsourcing, where the user of the classifier will not have access to the entire training set. We identify two important properties of scene-plausible perceptible backdoor patterns, spatial invariance and robustness, based on which we propose a novel detector using the maximum achievable misclassification fraction (MAMF) statistic. We detect whether the trained DNN has been backdoor-attacked and infer the source and target classes. Our detector outperforms existing detectors and, coupled with an imperceptible backdoor detector, helps achieve posttraining detection of most evasive backdoors of interest.

中文翻译：

在没有访问训练集的情况下，在受过训练的 DNN 中检测场景似真可感知的后门

后门数据中毒攻击将错误标记的示例添加到训练集中，并带有嵌入的后门模式，因此只要测试样本中存在后门模式，分类器就会学习分类到目标类。在这里，我们解决了场景似真可感知后门的训练后检测，这是一种相对容易形成的后门攻击，特别是针对 DNN 图像分类器。训练后防御者无法访问可能中毒的训练集，只能访问受过训练的分类器，以及一些不需要作为训练样本的未中毒示例。在没有中毒训练集的情况下，有关后门模式的唯一信息被编码在 DNN 的训练权重中。考虑到传统和专有系统、手机应用程序、以及培训外包，其中分类器的用户将无法访问整个培训集。我们确定了场景似真可感知后门模式的两个重要特性，空间不变性和鲁棒性，在此基础上，我们提出了一种使用最大可实现错误分类分数 (MAMF) 统计量的新型检测器。我们检测经过训练的 DNN 是否受到后门攻击并推断源和目标类。我们的检测器优于现有的检测器，并且与难以察觉的后门检测器相结合，有助于实现对大多数感兴趣的躲避后门的训练后检测。在此基础上，我们提出了一种使用最大可实现错误分类分数 (MAMF) 统计量的新型检测器。我们检测经过训练的 DNN 是否受到后门攻击并推断源和目标类。我们的检测器优于现有的检测器，并且与难以察觉的后门检测器相结合，有助于实现对大多数感兴趣的躲避后门的训练后检测。在此基础上，我们提出了一种使用最大可实现错误分类分数 (MAMF) 统计量的新型检测器。我们检测经过训练的 DNN 是否受到后门攻击并推断源和目标类。我们的检测器优于现有的检测器，并且与难以察觉的后门检测器相结合，有助于实现对大多数感兴趣的躲避后门的训练后检测。

更新日期：2021-02-23

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11