当前位置: X-MOL 学术EURASIP J. Info. Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Synthetic speech detection through short-term and long-term prediction traces
EURASIP Journal on Information Security Pub Date : 2021-04-06 , DOI: 10.1186/s13635-021-00116-3
Clara Borrelli , Paolo Bestagini , Fabio Antonacci , Augusto Sarti , Stefano Tubaro

Several methods for synthetic audio speech generation have been developed in the literature through the years. With the great technological advances brought by deep learning, many novel synthetic speech techniques achieving incredible realistic results have been recently proposed. As these methods generate convincing fake human voices, they can be used in a malicious way to negatively impact on today’s society (e.g., people impersonation, fake news spreading, opinion formation). For this reason, the ability of detecting whether a speech recording is synthetic or pristine is becoming an urgent necessity. In this work, we develop a synthetic speech detector. This takes as input an audio recording, extracts a series of hand-crafted features motivated by the speech-processing literature, and classify them in either closed-set or open-set. The proposed detector is validated on a publicly available dataset consisting of 17 synthetic speech generation algorithms ranging from old fashioned vocoders to modern deep learning solutions. Results show that the proposed method outperforms recently proposed detectors in the forensics literature.

中文翻译:

通过短期和长期预测轨迹进行合成语音检测

这些年来,在文献中已经开发了几种用于合成音频语音生成的方法。随着深度学习带来的巨大技术进步,近来提出了许多新颖的合成语音技术,它们实现了令人难以置信的逼真的效果。当这些方法产生令人信服的假人类声音时,可能会以恶意方式使用它们对当今社会产生负面影响(例如,假冒他人,假新闻传播,舆论形成)。因此,迫切需要检测语音记录是合成的还是原始的能力。在这项工作中,我们开发了一种合成语音检测器。这将音频记录作为输入,提取语音处理文献所激发的一系列手工制作的特征,并将它们分为固定设置或开放设置。拟议的检测器在公开可用的数据集上得到验证,该数据集由17种合成语音生成算法组成,从老式的声码器到现代深度学习解决方案,不一而足。结果表明,所提出的方法优于法医学文献中最近提出的检测器。
更新日期:2021-04-08
down
wechat
bug