Deep Learning Serves Voice Cloning: How Vulnerable Are Automatic Speaker Veriﬁcation Systems to Spooﬁng Trials?,IEEE Communications Magazine

当前位置： X-MOL 学术 › IEEE Commun. Mag. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep Learning Serves Voice Cloning: How Vulnerable Are Automatic Speaker Veriﬁcation Systems to Spooﬁng Trials?
IEEE Communications Magazine ( IF 11.2 ) Pub Date : 2020-02-01 , DOI: 10.1109/mcom.001.1900396
Pavol Partila , Jaromir Tovarek , Gokhan Hakki Ilk , Jan Rozhon , Miroslav Voznak

This article verifies the reliability of automatic speaker verification (ASV) systems on new synthesis methods based on deep neural networks. ASV systems are widely used and applied regarding secure and effective biometric authentication. On the other hand, the rapid deployment of ASV systems contributes to the increased attention of attackers with newer and more sophisticated spoofing methods. Until recently, speech synthesis of the reference speaker did not seriously compromise the latest ASV systems. This situation is changing with the deployment of deep neural networks into the synthesis process. Projects including WaveNet, Deep Voice, Voice Loop, and many others generate very natural and high-quality speech that may clone voice identity. We are slowly approaching an era where we will not be able to recognize a genuine voice from a synthesized one. Therefore, it is necessary to define the robustness of current ASV systems to new methods of voice cloning. In this article, well-known SVM and GMM as well as new CNN-based ASVs are applied and subjected to synthesized speech from Tacotron 2 with the WaveNet TTS system. The results of this work confirm our concerns regarding the reliability of ASV systems against synthesized speech.

中文翻译：

深度学习服务于语音克隆：自动说话人验证系统在欺骗试验方面有多脆弱？

本文在基于深度神经网络的新合成方法上验证了自动说话人验证 (ASV) 系统的可靠性。ASV 系统在安全有效的生物特征认证方面被广泛使用和应用。另一方面，ASV 系统的快速部署有助于使用更新和更复杂的欺骗方法增加攻击者的注意力。直到最近，参考说话者的语音合成并没有严重损害最新的 ASV 系统。随着将深度神经网络部署到合成过程中，这种情况正在发生变化。包括 WaveNet、Deep Voice、Voice Loop 和许多其他项目在内的项目可以生成非常自然和高质量的语音，可以克隆语音身份。我们正在慢慢接近一个无法从合成声音中识别出真实声音的时代。因此，有必要定义当前 ASV 系统对新的语音克隆方法的鲁棒性。在本文中，著名的 SVM 和 GMM 以及新的基于 CNN 的 ASV 被应用，并使用 WaveNet TTS 系统处理来自 Tacotron 2 的合成语音。这项工作的结果证实了我们对 ASV 系统对合成语音的可靠性的担忧。

更新日期：2020-02-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>