Data augmentation versus noise compensation for x- vector speaker recognition systems in noisy environments,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Data augmentation versus noise compensation for x- vector speaker recognition systems in noisy environments
arXiv - CS - Computation and Language Pub Date : 2020-06-29 , DOI: arxiv-2006.15903
Mohammad Mohammadamini (LIA), Driss Matrouf (LIA)

The explosion of available speech data and new speaker modeling methods based on deep neural networks (DNN) have given the ability to develop more robust speaker recognition systems. Among DNN speaker modelling techniques, x-vector system has shown a degree of robustness in noisy environments. Previous studies suggest that by increasing the number of speakers in the training data and using data augmentation more robust speaker recognition systems are achievable in noisy environments. In this work, we want to know if explicit noise compensation techniques continue to be effective despite the general noise robustness of these systems. For this study, we will use two different x-vector networks: the first one is trained on Voxceleb1 (Protocol1), and the second one is trained on Voxceleb1+Voxveleb2 (Protocol2). We propose to add a denoising x-vector subsystem before scoring. Experimental results show that, the x-vector system used in Protocol2 is more robust than the other one used Protocol1. Despite this observation we will show that explicit noise compensation gives almost the same EER relative gain in both protocols. For example, in the Protocol2 we have 21% to 66% improvement of EER with denoising techniques.

中文翻译：

噪声环境中 x 向量说话人识别系统的数据增强与噪声补偿

可用语音数据的爆炸式增长和基于深度神经网络 (DNN) 的新说话人建模方法使开发更强大的说话人识别系统成为可能。在 DNN 说话人建模技术中，x-vector 系统在嘈杂的环境中表现出一定程度的鲁棒性。以前的研究表明，通过增加训练数据中说话人的数量和使用数据增强，可以在嘈杂的环境中实现更强大的说话人识别系统。在这项工作中，我们想知道尽管这些系统具有一般的噪声鲁棒性，但显式噪声补偿技术是否仍然有效。在本研究中，我们将使用两个不同的 x 向量网络：第一个在 Voxceleb1（Protocol1）上训练，第二个在 Voxceleb1+Voxveleb2（Protocol2）上训练。我们建议在评分之前添加一个去噪 x 向量子系统。实验结果表明，Protocol2 中使用的 x-vector 系统比另一个使用 Protocol1 的系统更健壮。尽管有这个观察，我们将表明显式噪声补偿在两种协议中提供几乎相同的 EER 相对增益。例如，在协议 2 中，我们使用去噪技术将 EER 提高了 21% 到 66%。

更新日期：2020-06-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>