当前位置: X-MOL 学术Neural Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Spatial reconstructed local attention Res2Net with F0 subband for fake speech detection
Neural Networks ( IF 7.8 ) Pub Date : 2024-04-16 , DOI: 10.1016/j.neunet.2024.106320
Cunhang Fan , Jun Xue , Jianhua Tao , Jiangyan Yi , Chenglong Wang , Chengshi Zheng , Zhao Lv

The rhythm of bonafide speech is often difficult to replicate, which causes that the fundamental frequency (F0) of synthetic speech is significantly different from that of real speech. It is expected that the F0 feature contains the discriminative information for the fake speech detection (FSD) task. In this paper, we propose a novel F0 subband for FSD. In addition, to effectively model the F0 subband so as to improve the performance of FSD, the spatial reconstructed local attention Res2Net (SR-LA Res2Net) is proposed. Specifically, Res2Net is used as a backbone network to obtain multiscale information, and enhanced with a spatial reconstruction mechanism to avoid losing important information when the channel group is constantly superimposed. In addition, local attention is designed to make the model focus on the local information of the F0 subband. Experimental results on the ASVspoof 2019 LA dataset show that our proposed method obtains an equal error rate (EER) of 0.47% and a minimum tandem detection cost function (min t-DCF) of 0.0159, achieving the state-of-the-art performance among all of the single systems.

中文翻译:

具有 F0 子带的空间重建局部注意力 Res2Net,用于虚假语音检测

真实语音的节奏往往难以复制,这导致合成语音的基频(F0)与真实语音显着不同。预计 F0 特征包含虚假语音检测(FSD)任务的判别信息。在本文中,我们为 FSD 提出了一种新颖的 F0 子带。此外,为了有效地对F0子带进行建模,从而提高FSD的性能,提出了空间重构局部注意力Res2Net(SR-LA Res2Net)。具体来说,采用Res2Net作为骨干网络来获取多尺度信息,并通过空间重建机制进行增强,以避免通道组不断叠加时丢失重要信息。此外,还设计了局部注意力,使模型集中于F0子带的局部信息。 ASVspoof 2019 LA 数据集上的实验结果表明,我们提出的方法获得了 0.47% 的等错误率 (EER) 和 0.0159 的最小串联检测成本函数 (min t-DCF),实现了最先进的性能在所有单一系统之间。
更新日期:2024-04-16
down
wechat
bug