当前位置: X-MOL 学术EURASIP J. Audio Speech Music Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Replay attack detection with auditory filter-based relative phase features
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2019-06-10 , DOI: 10.1186/s13636-019-0151-2
Zeyan Oo , Longbiao Wang , Khomdet Phapatanaburi , Meng Liu , Seiichi Nakagawa , Masahiro Iwahashi , Jianwu Dang

There are many studies on detecting human speech from artificially generated speech and automatic speaker verification (ASV) that aim to detect and identify whether the given speech belongs to a given speaker. Recent studies demonstrate the success of the relative phase (RP) feature in speaker recognition/verification and the detection of synthesized speech and converted speech. However, there are few studies that focus on the RP feature for replay attack detection. In this paper, we improve the discriminating ability of the RP feature by proposing two new auditory filter-based RP features for replay attack detection. The key idea is to integrate the advantage of RP-based features in signal representation with the advantage of two auditory filter-based RP features. For the first proposed feature, we apply a Mel-filter bank to convert the signal representation of conventional RP information from a linear scale to a Mel scale, where the modified representation is called the Mel-scale RP feature. For the other proposed feature, a gammatone filter bank is applied to scale the RP information, where the scaled RP feature is called the gammatone-scale RP feature. These two proposed phase-based features are implemented to achieve better performance than a conventional RP feature because of the scale resolution and. In addition to the use of individual Mel/gammatone-scale RP features, a combination of the scores of these proposed RP features and a standard magnitude-based feature, that is, the constant Q transform cepstral coefficient (CQCC), is also applied to further improve the reliable detection decision. The effectiveness of the proposed Mel-scale RP feature, gammatone-scale RP feature, and their combination are evaluated using the ASVspoof 2017 dataset. On the evaluation dataset, our proposed methods demonstrate significant improvement over the existing feature and baseline CQCC feature. The combination of the CQCC and gammatone-scale RP provides the best performance compared with an individual baseline feature and other combination methods.

中文翻译:

基于听觉滤波器的相对相位特征重放攻击检测

有许多关于从人工生成的语音中检测人类语音和自动说话人验证 (ASV) 的研究,旨在检测和识别给定语音是否属于给定说话人。最近的研究表明,相对相位 (RP) 特征在说话人识别/验证以及合成语音和转换语音的检测中取得了成功。然而,很少有研究关注用于重放攻击检测的 RP 特性。在本文中,我们通过提出两个新的基于听觉滤波器的 RP 特征用于重放攻击检测来提高 RP 特征的判别能力。关键思想是将基于 RP 的特征在信号表示中的优势与两个基于听觉滤波器的 RP 特征的优势相结合。对于第一个提议的特征,我们应用梅尔滤波器组将传统 RP 信息的信号表示从线性尺度转换为梅尔尺度,其中修改后的表示称为梅尔尺度 RP 特征。对于另一个提出的特征,应用伽马色调滤波器组来缩放 RP 信息,其中缩放的 RP 特征称为伽马色调尺度 RP 特征。由于尺度分辨率和,这两个提出的基于相位的特征被实施以实现比传统RP特征更好的性能。除了使用单独的 Mel/gammatone 尺度 RP 特征之外,这些提议的 RP 特征的分数与一个标准的基于幅度的特征,即恒定 Q 变换倒谱系数 (CQCC) 的组合也被应用于进一步提高可靠的检测决策。使用 ASVspoof 2017 数据集评估了提议的 Mel-scale RP 特征、gammatone-scale RP 特征及其组合的有效性。在评估数据集上,我们提出的方法证明了对现有特征和基线 CQCC 特征的显着改进。与单个基线特征和其他组合方法相比,CQCC 和 gammatone 尺度 RP 的组合提供了最佳性能。
更新日期:2019-06-10
down
wechat
bug