Real-time, Robust and Adaptive Universal Adversarial Attacks Against Speaker Recognition Systems,Journal of Signal Processing Systems

当前位置： X-MOL 学术 › J. Sign. Process. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Real-time, Robust and Adaptive Universal Adversarial Attacks Against Speaker Recognition Systems
Journal of Signal Processing Systems ( IF 1.6 ) Pub Date : 2021-02-10 , DOI: 10.1007/s11265-020-01629-9
Yi Xie , Zhuohang Li , Cong Shi , Jian Liu , Yingying Chen , Bo Yuan

Voice user interface (VUI) has become increasingly popular in recent years. Speaker recognition system, as one of the most common VUIs, has emerged as an important technique to facilitate security-required applications and services. In this paper, we propose to design, for the first time, a real-time, robust, and adaptive universal adversarial attack against the state-of-the-art deep neural network (DNN) based speaker recognition systems in the white-box scenario. By developing an audio-agnostic universal perturbation, we can make the DNN-based speaker recognition systems to misidentify the speaker as the adversary-desired target label, with using a single perturbation that can be applied on arbitrary enrolled speaker’s voice. In addition, we improve the robustness of our attack by modeling the sound distortions caused by the physical over-the-air propagation through estimating room impulse response (RIR). Moreover, we propose to adaptively adjust the magnitude of perturbations according to each individual utterance via spectral gating. This can further improve the imperceptibility of the adversarial perturbations with minor increase of attack generation time. Experiments on a public dataset of 109 English speakers demonstrate the effectiveness and robustness of the proposed attack. Our attack method achieves average 90% attack success rate on both X-vector and d-vector speaker recognition systems. Meanwhile, our method achieves 100 × speedup on attack launching time, as compared to the conventional non-universal attacks.

中文翻译：

针对说话人识别系统的实时，鲁棒和自适应通用对抗攻击

语音用户界面（VUI）近年来变得越来越流行。说话人识别系统作为最常见的VUI之一，已经成为促进需要安全性的应用程序和服务的重要技术。在本文中，我们建议首次设计针对白盒中基于最新深度神经网络（DNN）的说话人识别系统的实时，鲁棒和自适应通用对抗性攻击场景。通过开发与音频无关的通用扰动，我们可以使基于DNN的说话人识别系统使用单个扰动，将其误识别为对手想要的目标标签，并且可以将其应用于任意注册的说话人语音。此外，我们通过估算房间脉冲响应（RIR），对由物理无线传播引起的声音失真进行建模，从而提高了攻击的鲁棒性。此外，我们建议通过频谱门控根据每个单独的话语自适应地调整扰动的大小。这可以进一步提高对抗性扰动的隐蔽性，同时增加攻击发生时间。在109位讲英语的人的公共数据集中进行的实验证明了所提出攻击的有效性和鲁棒性。我们的攻击方法平均达到90 这可以进一步提高对抗性扰动的隐蔽性，同时增加攻击发生时间。在109位讲英语的人的公共数据集中进行的实验证明了所提出攻击的有效性和鲁棒性。我们的攻击方法平均达到90 这可以进一步提高对抗性扰动的隐蔽性，同时增加攻击发生时间。在109位讲英语的人的公共数据集中进行的实验证明了所提出攻击的有效性和鲁棒性。我们的攻击方法平均达到90X向量和d向量说话者识别系统的攻击成功率百分比。同时，与传统的非通用攻击相比，我们的方法可将攻击发起时间提高100倍。

更新日期：2021-02-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文