当前位置: X-MOL 学术arXiv.cs.CR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Devil is in the GAN: Defending Deep Generative Models Against Backdoor Attacks
arXiv - CS - Cryptography and Security Pub Date : 2021-08-03 , DOI: arxiv-2108.01644
Ambrish Rawat, Killian Levacher, Mathieu Sinn

Deep Generative Models (DGMs) allow users to synthesize data from complex, high-dimensional manifolds. Industry applications of DGMs include data augmentation to boost performance of (semi-)supervised machine learning, or to mitigate fairness or privacy concerns. Large-scale DGMs are notoriously hard to train, requiring expert skills, large amounts of data and extensive computational resources. Thus, it can be expected that many enterprises will resort to sourcing pre-trained DGMs from potentially unverified third parties, e.g.~open source model repositories. As we show in this paper, such a deployment scenario poses a new attack surface, which allows adversaries to potentially undermine the integrity of entire machine learning development pipelines in a victim organization. Specifically, we describe novel training-time attacks resulting in corrupted DGMs that synthesize regular data under normal operations and designated target outputs for inputs sampled from a trigger distribution. Depending on the control that the adversary has over the random number generation, this imposes various degrees of risk that harmful data may enter the machine learning development pipelines, potentially causing material or reputational damage to the victim organization. Our attacks are based on adversarial loss functions that combine the dual objectives of attack stealth and fidelity. We show its effectiveness for a variety of DGM architectures (Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs)) and data domains (images, audio). Our experiments show that - even for large-scale industry-grade DGMs - our attack can be mounted with only modest computational efforts. We also investigate the effectiveness of different defensive approaches (based on static/dynamic model and output inspections) and prescribe a practical defense strategy that paves the way for safe usage of DGMs.

中文翻译:

魔鬼在 GAN 中:保护深层生成模型免受后门攻击

深度生成模型 (DGM) 允许用户从复杂的高维流形中合成数据。DGM 的行业应用包括数据增强,以提高(半)监督机器学习的性能,或减轻公平或隐私问题。众所周知,大规模 DGM 很难训练,需要专业技能、大量数据和大量计算资源。因此,可以预期,许多企业将求助于从可能未经验证的第三方(例如开源模型库)采购预先训练的 DGM。正如我们在本文中展示的那样,这种部署场景会带来一个新的攻击面,它允许攻击者潜在地破坏受害组织中整个机器学习开发管道的完整性。具体来说,我们描述了新的训练时间攻击,导致损坏的 DGM 在正常操作下合成常规数据,并为从触发器分布采样的输入指定目标输出。根据对手对随机数生成的控制,这会带来不同程度的风险,即有害数据可能进入机器学习开发管道,从而可能对受害组织造成物质或声誉损害。我们的攻击基于结合了攻击隐蔽性和保真度双重目标的对抗性损失函数。我们展示了它对各种 DGM 架构(生成对抗网络 (GAN)、变分自动编码器 (VAE))和数据域(图像、音频)的有效性。我们的实验表明,即使对于大型工业级 DGM,我们的攻击也可以通过适度的计算工作进行。我们还研究了不同防御方法的有效性(基于静态/动态模型和输出检查),并制定了一种实用的防御策略,为安全使用 DGM 铺平了道路。
更新日期:2021-08-04
down
wechat
bug