Vocal Tract Length Perturbation for Text-Dependent Speaker Verification With Autoregressive Prediction Coding,IEEE Signal Processing Letters

当前位置： X-MOL 学术 › IEEE Signal Process. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Vocal Tract Length Perturbation for Text-Dependent Speaker Verification With Autoregressive Prediction Coding
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 2021-01-28 , DOI: 10.1109/lsp.2021.3055180
Achintya kr. Sarkar , Zheng-Hua Tan

In this letter, we propose a vocal tract length (VTL) perturbation method for text-dependent speaker verification (TD-SV), in which a set of TD-SV systems are trained, one for each VTL factor, and score-level fusion is applied to make a final decision. Next, we explore the bottleneck (BN) feature extracted by training deep neural networks with a self-supervised learning objective, autoregressive predictive coding (APC), for TD-SV and comapre it with the well-studied speaker-discriminant BN feature. The proposed VTL method is then applied to APC and speaker-discriminant BN features. In the end, we combine the VTL perturbation systems trained on MFCC and the two BN features in the score domain. Experiments are performed on the RedDots challenge 2016 database of TD-SV using short utterances with Gaussian mixture model-universal background model and i-vector techniques. Results show the proposed methods significantly outperform the baselines.

中文翻译：

具有自回归预测编码的文本相关说话人验证的声道长度扰动

在这封信中，我们提出了一种针对文本相关的说话人验证（TD-SV）的声道长度（VTL）扰动方法，其中训练了一组TD-SV系统，每个VTL因子一个，并且评分级别融合用于做出最终决定。接下来，我们探索通过训练具有自我监督学习目标的深度神经网络，自回归预测编码（APC）来针对TD-SV提取的瓶颈（BN）功能，并将其与经过充分研究的区分说话人的BN功能进行比较。然后将提出的VTL方法应用于APC和区分说话者的BN功能。最后，我们结合了在MFCC上训练的VTL摄动系统和分数域中的两个BN功能。使用短话语，高斯混合模型-通用背景模型和i-vector技术，在TD-SV的RedDots Challenge 2016数据库上进行了实验。结果表明，所提出的方法明显优于基线。

更新日期：2021-02-26

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11