当前位置: X-MOL 学术arXiv.cs.MM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detection of AI-Synthesized Speech Using Cepstral & Bispectral Statistics
arXiv - CS - Multimedia Pub Date : 2020-09-03 , DOI: arxiv-2009.01934
Arun K. Singh (1), Priyanka Singh (2) ((1) Indian Institute of Technology Jammu, (2) Dhirubhai Ambani Institute of Information and Communication Technology)

Digital technology has made possible unimaginable applications come true. It seems exciting to have a handful of tools for easy editing and manipulation, but it raises alarming concerns that can propagate as speech clones, duplicates, or maybe deep fakes. Validating the authenticity of a speech is one of the primary problems of digital audio forensics. We propose an approach to distinguish human speech from AI synthesized speech exploiting the Bi-spectral and Cepstral analysis. Higher-order statistics have less correlation for human speech in comparison to a synthesized speech. Also, Cepstral analysis revealed a durable power component in human speech that is missing for a synthesized speech. We integrate both these analyses and propose a machine learning model to detect AI synthesized speech.

中文翻译:

使用倒谱和双谱统计检测 AI 合成语音

数字技术使难以想象的应用成为可能。拥有一些用于轻松编辑和操作的工具似乎令人兴奋,但它引发了令人担忧的问题,这些问题可能会以语音克隆、重复或深度伪造的形式传播。验证语音的真实性是数字音频取证的主要问题之一。我们提出了一种利用双谱和倒谱分析来区分人类语音和 AI 合成语音的方法。与合成语音相比,高阶统计数据与人类语音的相关性较小。此外,倒谱分析揭示了合成语音中缺少的人类语音中的持久功率分量。我们整合了这些分析并提出了一个机器学习模型来检测 AI 合成语音。
更新日期:2020-09-07
down
wechat
bug