当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Privacy Preserving Defense For Black Box Classifiers Against On-Line Adversarial Attacks.
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2022-11-07 , DOI: 10.1109/tpami.2021.3125931
Rajkumar Theagarajan 1 , Bir Bhanu 1
Affiliation  

Deep learning models have been shown to be vulnerable to adversarial attacks. Adversarial attacks are imperceptible perturbations added to an image such that the deep learning model misclassifies the image with a high confidence. Existing adversarial defenses validate their performance using only the classification accuracy. However, classification accuracy by itself is not a reliable metric to determine if the resulting image is "adversarial-free". This is a foundational problem for online image recognition applications where the ground-truth of the incoming image is not known and hence we cannot compute the accuracy of the classifier or validate if the image is "adversarial-free" or not. This paper proposes a novel privacy preserving framework for defending Black box classifiers from adversarial attacks using an ensemble of iterative adversarial image purifiers whose performance is continuously validated in a loop using Bayesian uncertainties. The proposed approach can convert a single-step black box adversarial defense into an iterative defense and proposes three novel privacy preserving Knowledge Distillation (KD) approaches that use prior meta-information from various datasets to mimic the performance of the Black box classifier. Additionally, this paper proves the existence of an optimal distribution for the purified images that can reach a theoretical lower bound, beyond which the image can no longer be purified. Experimental results on six public benchmark datasets namely: 1) Fashion-MNIST, 2) CIFAR-10, 3) GTSRB, 4) MIO-TCD, 5) Tiny-ImageNet, and 6) MS-Celeb show that the proposed approach can consistently detect adversarial examples and purify or reject them against a variety of adversarial attacks.

中文翻译:

针对在线对抗攻击的黑盒分类器的隐私保护防御。

深度学习模型已被证明容易受到对抗性攻击。对抗性攻击是添加到图像中的难以察觉的扰动,使得深度学习模型以高置信度对图像进行错误分类。现有的对抗性防御仅使用分类准确性来验证其性能。然而,分类准确度本身并不是确定生成的图像是否“无对抗性”的可靠指标。这是在线图像识别应用程序的一个基本问题,其中输入图像的真实情况未知,因此我们无法计算分类器的准确性或验证图像是否“无对抗”。本文提出了一种新颖的隐私保护框架,使用一组迭代的对抗性图像净化器来保护黑盒分类器免受对抗性攻击,其性能在使用贝叶斯不确定性的循环中不断得到验证。所提出的方法可以将单步黑盒对抗防御转换为迭代防御,并提出了三种新颖的隐私保护知识蒸馏 (KD) 方法,这些方法使用来自各种数据集的先验元信息来模拟黑盒分类器的性能。此外,本文证明了纯化图像存在一个最优分布,该分布可以达到理论下限,超过该下限图像将无法再被纯化。六个公共基准数据集的实验结果,即:1) Fashion-MNIST,2) CIFAR-10,3) GTSRB,4) MIO-TCD,
更新日期:2021-11-08
down
wechat
bug