Fast speech adversarial example generation for keyword spotting system with conditional GAN,Computer Communications

当前位置： X-MOL 学术 › Comput. Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Fast speech adversarial example generation for keyword spotting system with conditional GAN
Computer Communications ( IF 4.5 ) Pub Date : 2021-08-20 , DOI: 10.1016/j.comcom.2021.08.010
Donghua Wang ₁ , Li Dong ₁ , Rangding Wang ₁ , Diqun Yan ₁

Affiliation

Deep network-based keyword spotting (KWS) has embraced great success in many speech assistant applications. However, such network-based KWS systems were demonstrated vulnerable to adversarial attacks. In this work, we propose to utilize a conditional generative adversarial network (CGAN) to efficiently craft targeted speech adversarial examples. Specifically, we first transform the attacking target label into a vector, which is treated as the condition input of CGAN. The generator in CGAN is tasked to generate perturbation that could make the adversarial example misclassified as the pre-specified target keyword, while simultaneously deceiving the discriminator to misclassify the adversarial example as genuine. The discriminator aims to differentiate the crafted adversarial examples from the legitimate samples. Secondly, the target network-based KWS classifier(s) are ensembled and integrated into the proposed CGAN framework to enforce the generator to construct model-independent perturbation. The classification error loss of the target KWS is back-propagated through gradients for guiding the weight update of the generator. Finally, with properly devised network architecture and training procedure, we obtain a well-trained generator that generates the adversarial perturbation for a given speech clip and target label. Experimental results show that the crafted adversarial examples could effectively attack the state-of-the-art KWS system with quite a high attack success rate, while attaining acceptable perception quality.

中文翻译：

具有条件 GAN 的关键字发现系统的快速语音对抗示例生成

基于深度网络的关键字发现 (KWS) 在许多语音助手应用程序中取得了巨大成功。然而，这种基于网络的 KWS 系统被证明容易受到对抗性攻击。在这项工作中，我们建议利用条件生成对抗网络（CGAN）来有效地制作有针对性的语音对抗示例。具体来说，我们首先将攻击目标标签转化为一个向量，作为CGAN的条件输入。CGAN 中的生成器的任务是生成扰动，使对抗样本被错误分类为预先指定的目标关键字，同时欺骗鉴别器将对抗样本错误分类为真实的。鉴别器旨在将精心制作的对抗样本与合法样本区分开来。其次，基于目标网络的 KWS 分类器被集成并集成到所提出的 CGAN 框架中，以强制生成器构建与模型无关的扰动。目标 KWS 的分类误差损失通过梯度反向传播，用于指导生成器的权重更新。最后，通过适当设计的网络架构和训练程序，我们获得了一个训练有素的生成器，可以为给定的语音剪辑和目标标签生成对抗性扰动。实验结果表明，精心制作的对抗样本可以有效地攻击最先进的 KWS 系统，攻击成功率相当高，同时获得可接受的感知质量。目标 KWS 的分类误差损失通过梯度反向传播，用于指导生成器的权重更新。最后，通过适当设计的网络架构和训练程序，我们获得了一个训练有素的生成器，可以为给定的语音剪辑和目标标签生成对抗性扰动。实验结果表明，精心制作的对抗样本可以有效地攻击最先进的 KWS 系统，攻击成功率相当高，同时获得可接受的感知质量。目标 KWS 的分类误差损失通过梯度反向传播，用于指导生成器的权重更新。最后，通过适当设计的网络架构和训练程序，我们获得了一个训练有素的生成器，可以为给定的语音剪辑和目标标签生成对抗性扰动。实验结果表明，精心制作的对抗样本可以有效地攻击最先进的 KWS 系统，攻击成功率相当高，同时获得可接受的感知质量。我们获得了一个训练有素的生成器，它为给定的语音剪辑和目标标签生成对抗性扰动。实验结果表明，精心制作的对抗样本可以有效地攻击最先进的 KWS 系统，攻击成功率相当高，同时获得可接受的感知质量。我们获得了一个训练有素的生成器，它为给定的语音剪辑和目标标签生成对抗性扰动。实验结果表明，精心制作的对抗样本可以有效地攻击最先进的 KWS 系统，攻击成功率相当高，同时获得可接受的感知质量。

更新日期：2021-08-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11