Generating Out of Distribution Adversarial Attack Using Latent Space Poisoning,IEEE Signal Processing Letters

当前位置： X-MOL 学术 › IEEE Signal Process. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Generating Out of Distribution Adversarial Attack Using Latent Space Poisoning
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 2021-02-23 , DOI: 10.1109/lsp.2021.3061327
Ujjwal Upadhyay , Prerana Mukherjee

Traditional adversarial attacks rely upon the perturbations generated by gradients from the network which are generally safeguarded by gradient guided search to provide an adversarial counterpart to the network. In this letter, we propose a novel framework to generate adversarial examples where the actual image is not corrupted rather its latent space representation is utilized to tamper the inherent structure of the image while maintaining the perceptual quality intact and to act as legitimate data samples. As opposed to gradient-based attacks, the latent space poisoning exploits the inclination of classifiers to model the independent and identical distribution of the training dataset and tricks it by producing out of distribution samples. We train a disentangled variational autoencoder (

$\beta$

-VAE) to model the data in latent space and then we add noise perturbations using a class-conditioned distribution function to the latent space under the constraint that it is misclassified to the target label. Our empirical results on MNIST, SVHN, and CelebA dataset validate that the generated adversarial examples can easily fool robust

$l_0$

$l_2$

$l_{\infty }$

norm classifiers designed using provably robust defense mechanisms. The source code is made publicly available at https://github.com/Ujjwal-9/latent-space-poisoning

中文翻译：

利用潜在空间中毒产生分布外对抗攻击

传统的对抗攻击依赖于网络梯度产生的扰动，通常由梯度引导搜索来保护这种扰动，以提供与网络的对抗对应物。在这封信中，我们提出了一个新颖的框架来生成对抗性示例，在这些示例中，实际图像不会被破坏，而是利用其潜在空间表示来篡改图像的固有结构，同时保持感知质量不变并充当合法数据样本。与基于梯度的攻击相反，潜在空间中毒利用分类器的倾向性来建模训练数据集的独立且相同的分布，并通过产生分布样本来欺骗它。我们训练了一个纠缠的变分自动编码器（

$ \ beta $

（-VAE）对潜在空间中的数据进行建模，然后在将其错误分类为目标标签的约束下，使用类条件分布函数向潜在空间添加噪声扰动。我们在MNIST，SVHN和CelebA数据集上的经验结果验证了生成的对抗示例可以轻松地使鲁棒性

$ l_0 $

，

$ l_2 $

，

$ l _ {\ infty} $

使用可靠的防御机制设计的规范分类器。源代码在以下位置公开可用https://github.com/Ujjwal-9/latent-space-poisoning

更新日期：2021-03-23

点击分享查看原文

点击收藏

公开下载