当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning to fool the speaker recognition
arXiv - CS - Sound Pub Date : 2020-04-07 , DOI: arxiv-2004.03434
Jiguo Li, Xinfeng Zhang, Jizheng Xu, Li Zhang, Yue Wang, Siwei Ma, Wen Gao

Due to the widespread deployment of fingerprint/face/speaker recognition systems, attacking deep learning based biometric systems has drawn more and more attention. Previous research mainly studied the attack to the vision-based system, such as fingerprint and face recognition. While the attack for speaker recognition has not been investigated yet, although it has been widely used in our daily life. In this paper, we attempt to fool the state-of-the-art speaker recognition model and present \textit{speaker recognition attacker}, a lightweight model to fool the deep speaker recognition model by adding imperceptible perturbations onto the raw speech waveform. We find that the speaker recognition system is also vulnerable to the attack, and we achieve a high success rate on the non-targeted attack. Besides, we also present an effective method to optimize the speaker recognition attacker to obtain a trade-off between the attack success rate with the perceptual quality. Experiments on the TIMIT dataset show that we can achieve a sentence error rate of $99.2\%$ with an average SNR $57.2\text{dB}$ and PESQ 4.2 with speed rather faster than real-time.

中文翻译:

学习愚弄说话人识别

由于指纹/人脸/扬声器识别系统的广泛部署,攻击基于深度学习的生物识别系统越来越受到关注。以往的研究主要研究了对基于视觉的系统的攻击,如指纹和人脸识别。虽然针对说话人识别的攻击还没有被研究过,但它已经在我们的日常生活中被广泛使用。在本文中,我们试图欺骗最先进的说话人识别模型并提出 \textit{说话人识别攻击者},这是一种轻量级模型,通过在原始语音波形上添加难以察觉的扰动来欺骗深度说话人识别模型。我们发现说话人识别系统也容易受到攻击,我们在非针对性攻击上取得了很高的成功率。除了,我们还提出了一种优化说话人识别攻击者的有效方法,以获得攻击成功率与感知质量之间的权衡。在 TIMIT 数据集上的实验表明,我们可以达到 $99.2\%$ 的句子错误率,平均 SNR $57.2\text{dB}$ 和 PESQ 4.2,速度比实时快。
更新日期:2020-04-08
down
wechat
bug