A Principle Solution for Enroll-Test Mismatch in Speaker Recognition,IEEE/ACM Transactions on Audio, Speech, and Language Processing

当前位置： X-MOL 学术 › IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Principle Solution for Enroll-Test Mismatch in Speaker Recognition
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 4.1 ) Pub Date : 2022-01-05 , DOI: 10.1109/taslp.2022.3140558
Lantian Li , Dong Wang , Jiawen Kang , Renyu Wang , Jing Wu , Zhendong Gao , Xiao Chen

Mismatch between enrollment and test conditions causes serious performance degradation on speaker recognition systems. This paper presents a statistics decomposition (SD) approach to solve this problem. This approach decomposes the PLDA score into three components that corresponding to enrollment, prediction and normalization respectively. Given that correct statistics are used in each component, the resultant score is theoretically optimal. A comprehensive experimental study was conducted on three datasets with different types of mismatch: (1) physical channel mismatch, (2) long-term speaker characteristics mismatch, (3) near-far recording mismatch. The results demonstrated that the proposed SD approach is highly effective, and outperforms the ad-hoc multi-condition training approach that is commonly adopted but not optimal in theory.

中文翻译：

说话人识别中注册测试不匹配的原理解决方案

注册和测试条件之间的不匹配会导致说话人识别系统的性能严重下降。本文提出了一种统计分解（SD）方法来解决这个问题。该方法将 PLDA 分数分解为三个部分，分别对应于注册、预测和标准化。鉴于每个组件都使用了正确的统计数据，理论上所得的分数是最佳的。对具有不同类型失配的三个数据集进行了全面的实验研究：（1）物理通道失配，（2）长期说话人特征失配，（3）近远录音失配。结果表明，所提出的 SD 方法非常有效，并且优于常用但理论上并非最优的临时多条件训练方法。

更新日期：2022-01-05

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文