当前位置: X-MOL 学术Circuits Syst. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Speaker-Phrase-Specific Adaptation of PLDA Model for Improved Performance in Text-Dependent Speaker Verification
Circuits, Systems, and Signal Processing ( IF 2.3 ) Pub Date : 2021-04-10 , DOI: 10.1007/s00034-021-01713-w
Mohammad Azharuddin Laskar , Chuya China Bhanja , Rabul Hussain Laskar

The i-vector/probabilistic linear discriminant analysis (PLDA) framework has been popularly used in the field of speaker verification for a long time. Lately, the introduction of online i-vectors and its integration with dynamic time warping template matching technique have significantly improved the performance of text-dependent speaker verification system. The PLDA model learns to discriminate among instances of different speaker-phrase classes and also compensates for channel and session variability. However, when exposed to unseen speakers and text, the variability compensation model turns less than optimal, leading to substantial verification error. In this paper, PLDA adaptation, in order to incorporate the idea of speaker-phrase-dependent variability in the ivector/PLDA technique, has been proposed. The adapted model gets specifically tuned to particular speaker-phrase class, leading to a more optimal solution. Two adaptation techniques, namely interpolation and weighted likelihood, have been explored in this work. Experiments have been performed on Part 1 of the RSR2015 database, and relative equal error rate (EER) reductions of up to 58.22% and 45% have been observed for interpolation and weighted likelihood techniques, respectively. The use of speaker-phrase-specific mean and whitening parameters has led to further improvement, resulting in EER reduction of up to 20% relative to that of the adapted models.



中文翻译:

PLDA 模型的特定说话人短语改编,以提高文本相关说话人验证的性能

i-vector/probabilistic linear discriminant analysis (PLDA) 框架在说话人验证领域已经广泛使用了很长时间。最近,在线 i-vectors 的引入及其与动态时间扭曲模板匹配技术的集成显着提高了文本相关说话人验证系统的性能。PLDA 模型学习区分不同说话者短语类的实例,并补偿通道和会话的可变性。然而,当暴露于看不见的说话者和文本时,可变性补偿模型变得不太理想,导致大量的验证错误。在本文中,为了在 ivector/PLDA 技术中结合说话者短语相关可变性的思想,已经提出了 PLDA 自适应。调整后的模型专门针对特定的说话者短语类别进行调整,从而产生更优化的解决方案。在这项工作中已经探索了两种适应技术,即插值和加权似然。已经在 RSR2015 数据库的第 1 部分进行了实验,观察到插值和加权似然技术的相对等错误率 (EER) 分别降低了 58.22% 和 45%。使用特定于说话人短语的平均值和白化参数导致进一步改进,导致相对于适应模型的 EER 降低高达 20%。已经在 RSR2015 数据库的第 1 部分进行了实验,观察到插值和加权似然技术的相对等错误率 (EER) 分别降低了 58.22% 和 45%。使用特定于说话人短语的平均值和白化参数导致进一步改进,导致相对于适应模型的 EER 降低高达 20%。已经在 RSR2015 数据库的第 1 部分进行了实验,观察到插值和加权似然技术的相对等错误率 (EER) 分别降低了 58.22% 和 45%。使用特定于说话人短语的平均值和白化参数导致了进一步的改进,导致相对于适应模型的 EER 降低了 20%。

更新日期:2021-04-10
down
wechat
bug