Detection of replay spoof speech using teager energy feature cues,Computer Speech & Language

当前位置： X-MOL 学术 › Comput. Speech Lang › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Detection of replay spoof speech using teager energy feature cues
Computer Speech & Language ( IF 4.3 ) Pub Date : 2020-08-14 , DOI: 10.1016/j.csl.2020.101140
Madhu R. Kamble , Hemant A. Patil

The vulnerability of Automatic Speaker Verification (ASV) systems to spoofing or presentation attacks is still an open security issue. In this context, replay spoofing attacks pose a great threat to an ASV system since they can be easily performed (using a playback device, and without needing any technical skill). In this paper, we analyze replay speech signals in terms of reverberation that may occur during recording of the speech signal. Such reverberation introduces delay and changes in amplitude, producing close copies of speech signals, which significantly influences the replay components. To that effect, we propose to exploit the capabilities of the Teager Energy Operator (TEO) to compute a running estimate of subband energies for replay vs. genuine signals. We have used a linearly-spaced Gabor filterbank to obtain a narrowband filtered signal. The TEO has the ability to track the instantaneous changes of a signal. Experiments are performed on the ASVspoof 2017 Challenge version 2.0 database using a Gaussian Mixture Model (GMM) as pattern classifier. Furthermore, we compared our results with state-of-the-art feature sets, namely, Constant Q Cepstral Coefficients (CQCC), Linear Frequency Cepstral Coefficients (LFCC), Mel Frequency Cepstral Coefficients (MFCC), and used their score-level fusion with the proposed feature sets, i.e., Teager Energy Cepstral Coefficients (TECC), in order to obtain possible complementary information that further reduces the Equal Error Rate (EER). Relatively low EERs are obtained with score-level fusion of CQCC, MFCC, LFCC, and TECC feature sets, resulted in 6.68% and 10.45% on development and evaluation sets, respectively. Moreover, for the evaluation dataset, we also studied the performance of the TECC feature set on different Replay Configurations (RC), namely, for acoustic environments, playback, and recording devices. For all the levels of threat conditions (i.e., low, medium, and high-level) to an ASV system, the proposed feature set performed better compared to existing state-of-the-art feature sets. In addition to the ASVspoof 2017 Challenge database, we also performed experiments on other spoofing databases, namely, the ASVspoof 2015 Challenge, BTAS 2016, and ASVspoof 2019 Challenge databases. For all the spoofing databases used in this study, the proposed TECC feature set perform significantly better than the other feature sets.

中文翻译：

使用Teager能量特征提示检测重播欺骗性语音

自动扬声器验证（ASV）系统容易遭受欺骗或演示攻击，这仍然是一个开放的安全问题。在这种情况下，重播欺骗攻击对ASV系统构成了巨大威胁，因为它们很容易执行（使用回放设备，而无需任何技术技能）。在本文中，我们根据语音信号记录过程中可能发生的混响来分析语音重放信号。这种混响会引入延迟和幅度变化，从而产生语音信号的近似副本，从而极大地影响重播组件。为此，我们建议利用Teager能源运营商（TEO）的功能来计算子带能量的运行估计，以进行重播与噪声对比。真正的信号。我们已经使用线性间隔的Gabor滤波器组来获得窄带滤波信号。TEO具有跟踪信号瞬时变化的能力。使用高斯混合模型（GMM）作为模式分类器，在ASVspoof 2017 Challenge 2.0版数据库上进行了实验。此外，我们将结果与最先进的功能集（即恒定Q倒谱系数（CQCC），线性频率倒谱系数（LFCC），梅尔频率倒谱系数（MFCC））进行了比较，并使用了它们的得分级融合与建议的功能集，即提格能量倒谱系数（TECC），以获得可能的互补进一步降低均等错误率（EER）的信息。通过CQCC，MFCC，LFCC和TECC功能集的得分级融合获得的EER相对较低，分别在开发集和评估集上占6.68％和10.45％。此外，对于评估数据集，我们还研究了在不同的重放配置（RC）（即声学环境，重放和录音设备）上TECC功能集的性能。对于ASV系统的所有威胁条件级别（即低，中和高级别），与现有的最新功能集相比，建议的功能集表现更好。除了ASVspoof 2017挑战数据库外，我们还对其他欺骗数据库进行了实验，即ASVspoof 2015挑战，BTAS 2016和ASVspoof 2019挑战数据库。

更新日期：2020-08-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>