Task-Driven Variability Model for Speaker Verification,Circuits, Systems, and Signal Processing

当前位置： X-MOL 学术 › Circuits Syst. Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Task-Driven Variability Model for Speaker Verification
Circuits, Systems, and Signal Processing ( IF 2.3 ) Pub Date : 2019-11-27 , DOI: 10.1007/s00034-019-01315-7
Chen Chen , Jiqing Han

The total variability model (TVM)/probabilistic linear discriminant analysis (PLDA) framework is one of the most popular methods for speaker verification. In this framework, the i-vector representations are first extracted from utterances via an estimated TVM and then employed to estimate the PLDA parameters for classification. The TVM and PLDA are estimated serially, so the information loss in the TVM is inherited by the i-vectors, and then passed into the PLDA classifier. More seriously, this loss cannot be compensated by the PLDA. To solve this problem, we propose a task-driven variability model (TDVM) to jointly estimate the TVM and PLDA classifier. In this method, the feedback from the PLDA can supervise the optimal solution of the TVM to move toward the space that has the maximum between-class separation and minimum within-class variation. Meanwhile, this space is suitable for open-set test which can deal with unenrolled speakers. Unlike most embedding methods which extract the embedding representations via the stack of network structures, the TDVM contains the assumptions about latent variables, which can enhance the interpretation of speaker representation extraction. The proposed method is evaluated on the King-ASR-010 and VoxCeleb databases, and the experimental results show that the TDVM method can achieve better performance than the traditional TVM/PLDA and VGG-M network with different cost functions.

中文翻译：

说话人验证的任务驱动可变性模型

总可变性模型 (TVM)/概率线性判别分析 (PLDA) 框架是最流行的说话人验证方法之一。在这个框架中，i-vector 表示首先通过估计的 TVM 从话语中提取，然后用于估计 PLDA 参数以进行分类。TVM 和 PLDA 是串行估计的，因此 TVM 中的信息损失由 i-vector 继承，然后传入 PLDA 分类器。更严重的是，PLDA 无法弥补这一损失。为了解决这个问题，我们提出了一个任务驱动的可变性模型（TDVM）来联合估计 TVM 和 PLDA 分类器。在这种方法中，PLDA 的反馈可以监督 TVM 的最优解向具有最大类间分离和最小类内变化的空间移动。同时，该空间适用于开放式测试，可以处理未注册的发言者。与大多数通过网络结构堆栈提取嵌入表示的嵌入方法不同，TDVM 包含有关潜在变量的假设，这可以增强对说话人表示提取的解释。在King-ASR-010和VoxCeleb数据库上对所提出的方法进行了评估，实验结果表明，TDVM方法可以取得比具有不同代价函数的传统TVM/PLDA和VGG-M网络更好的性能。这可以增强说话人表示提取的解释。在King-ASR-010和VoxCeleb数据库上对所提出的方法进行了评估，实验结果表明，TDVM方法可以取得比具有不同代价函数的传统TVM/PLDA和VGG-M网络更好的性能。这可以增强说话人表示提取的解释。在King-ASR-010和VoxCeleb数据库上对所提出的方法进行了评估，实验结果表明，TDVM方法可以取得比具有不同代价函数的传统TVM/PLDA和VGG-M网络更好的性能。

更新日期：2019-11-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>