Adversarial attack and defense strategies for deep speaker recognition systems,Computer Speech & Language

当前位置： X-MOL 学术 › Comput. Speech Lang › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Adversarial attack and defense strategies for deep speaker recognition systems
Computer Speech & Language ( IF 3.1 ) Pub Date : 2021-02-12 , DOI: 10.1016/j.csl.2021.101199
Arindam Jati , Chin-Cheng Hsu , Monisankha Pal , Raghuveer Peri , Wael AbdAlmageed , Shrikanth Narayanan

Robust speaker recognition, including in the presence of malicious attacks, is becoming increasingly important and essential, especially due to the proliferation of smart speakers and personal agents that interact with an individual’s voice commands to perform diverse and even sensitive tasks. Adversarial attack is a recently revived domain which is shown to be effective in breaking deep neural network-based classifiers, specifically, by forcing them to change their posterior distribution by only perturbing the input samples by a very small amount. Although, significant progress in this realm has been made in the computer vision domain, advances within speaker recognition is still limited. We present an expository paper that considers several adversarial attacks to a deep speaker recognition system, employs strong defense methods as countermeasures, and reports a comprehensive set of ablation studies to better understand the problem. The experiments show that the speaker recognition systems are vulnerable to adversarial attacks, and the strongest attacks can reduce the accuracy of the system from 94% to even 0%. The study also compares the performances of the employed defense methods in detail, and finds adversarial training based on Projected Gradient Descent (PGD) to be the best defense method in our setting. We hope that the experiments presented in this paper provide baselines that can be useful for the research community interested in further studying adversarial robustness of speaker recognition systems.

中文翻译：

深度说话者识别系统的对抗性攻击和防御策略

健壮的说话人识别（包括在存在恶意攻击的情况下）变得越来越重要和重要，尤其是由于智能说话人和与个人语音命令交互以执行各种甚至敏感任务的个人代理的泛滥。对抗攻击是最近恢复的领域，它被证明可以有效地打破基于深度神经网络的分类器，特别是通过仅通过很小程度地干扰输入样本来强迫它们改变其后验分布。尽管在计算机视觉领域已经在该领域取得了重大进展，但是说话者识别方面的进展仍然有限。我们提供了一份说明文件，其中考虑了对深度说话者识别系统的几种对抗攻击，使用强大的防御方法作为对策，并报告了一系列全面的消融研究以更好地理解问题。实验表明，说话人识别系统容易受到对抗性攻击，而最强的攻击则会使系统的准确性从94％降低至0％。该研究还详细比较了采用的防御方法的性能，并发现基于投影梯度下降（PGD）的对抗训练是我们环境中最佳的防御方法。我们希望本文介绍的实验提供的基线可以对有兴趣进一步研究说话者识别系统的对抗鲁棒性的研究社区有用。实验表明，说话人识别系统容易受到对抗性攻击，而最强的攻击则会使系统的准确性从94％降低至0％。该研究还详细比较了采用的防御方法的性能，并发现基于投影梯度下降（PGD）的对抗训练是我们环境中最佳的防御方法。我们希望本文介绍的实验提供的基线可以对有兴趣进一步研究说话者识别系统的对抗鲁棒性的研究社区有用。实验表明，说话人识别系统容易受到对抗性攻击，而最强的攻击则会使系统的准确性从94％降低至0％。该研究还详细比较了采用的防御方法的性能，并发现基于投影梯度下降（PGD）的对抗训练是我们环境中最佳的防御方法。我们希望本文介绍的实验提供的基线可以对有兴趣进一步研究说话者识别系统的对抗鲁棒性的研究社区有用。并发现基于投影梯度下降（PGD）的对抗训练是我们环境中最佳的防御方法。我们希望本文中介绍的实验提供的基线可以对有兴趣进一步研究说话者识别系统的对抗鲁棒性的研究社区有用。并发现基于预测梯度下降（PGD）的对抗训练是我们环境中最佳的防御方法。我们希望本文介绍的实验提供的基线可以对有兴趣进一步研究说话者识别系统的对抗鲁棒性的研究社区有用。

更新日期：2021-02-16

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文