当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cosine metric learning based speaker verification
Speech Communication ( IF 2.4 ) Pub Date : 2020-02-20 , DOI: 10.1016/j.specom.2020.02.003
Zhongxin Bai , Xiao-Lei Zhang , Jingdong Chen

The performance of speaker verification depends on the overlap region of the decision scores of true and imposter trials. Motivated by the fact that the overlap region can be reduced by maximizing the between-class distance while minimizing the within-class variance of the trials, we present in this paper two cosine metric learning (CML) back-end algorithms. The first one, named m-CML, aims to enlarge the between-class distance with a regularization term to control the within-class variance. The second one, named v-CML, attempts to reduce the within-class variance with a regularization term to prevent the between-class distance from getting smaller. The regularization terms in the CML methods can be initialized by a traditional channel compensation method, e.g., the linear discriminant analysis. These two algorithms are combined with front-end processing for speaker verification. To validate their effectiveness, m-CML is combined with an i-vector front-end since it is good at enlarging the between-class distance of Gaussian score distributions while v-CML is combined with a d-vector or x-vector front-end as it is able to reduce the within-class variance of heavy-tailed score distributions significantly. Experimental results on the NIST and SITW speaker recognition evaluation corpora show that the proposed algorithms outperform their initialization channel compensation methods, and are competitive to the probabilistic linear discriminant analysis back-end in terms of performance. For comparison, we also applied the m-CML and v-CML methods to the i-vector and x-vector front-ends.



中文翻译:

基于余弦度量学习的说话人验证

说话者验证的性能取决于真实和冒名顶替者审判的决策分数的重叠区域。由于可以通过最大程度地减少类间距离,同时最大程度地减少试验内的方差来减少重叠区域,因此,本文提出了两种余弦度量学习(CML)后端算法。第一个名为m-CML,旨在通过正则化项扩大类间距离,以控制类内方差。第二个名为v-CML,它尝试使用正则化项减少类内方差,以防止类间距离变小。CML方法中的正则项可以通过传统的通道补偿方法(例如,线性判别分析)进行初始化。这两种算法与前端处理相结合以进行说话者验证。为了验证其有效性,m-CML与i-vector前端结合使用,因为它擅长扩大高斯分数分布的类间距离,而v-CML与d-vector或x-vector前端结合结束,因为它能够显着减少重尾分数分布的类内方差。在NIST和SITW说话人识别评估语料库上的实验结果表明,所提出的算法性能优于其初始化通道补偿方法,并且在性能方面与概率线性判别分析后端竞争。为了进行比较,我们还将m-CML和v-CML方法应用于i-vector和x-vector前端。为了验证其有效性,m-CML与i-vector前端结合使用,因为它擅长扩大高斯分数分布的类间距离,而v-CML与d-vector或x-vector前端结合使用结束,因为它能够显着减少重尾分数分布的类内方差。在NIST和SITW说话人识别评估语料库上的实验结果表明,所提出的算法优于其初始化通道补偿方法,并且在性能方面与概率线性判别分析后端竞争。为了进行比较,我们还将m-CML和v-CML方法应用于i-vector和x-vector前端。为了验证其有效性,m-CML与i-vector前端结合使用,因为它擅长扩大高斯分数分布的类间距离,而v-CML与d-vector或x-vector前端结合使用结束,因为它能够显着减少重尾分数分布的类内方差。在NIST和SITW说话人识别评估语料库上的实验结果表明,所提出的算法优于其初始化通道补偿方法,并且在性能方面与概率线性判别分析后端竞争。为了进行比较,我们还将m-CML和v-CML方法应用于i-vector和x-vector前端。m-CML与i-vector前端结合使用,因为它擅长扩大高斯分数分布的类间距离,而v-CML则与d-vector或x-vector前端结合使用,因为它能够显着减少重尾分数分布的组内方差。在NIST和SITW说话人识别评估语料库上的实验结果表明,所提出的算法优于其初始化通道补偿方法,并且在性能方面与概率线性判别分析后端竞争。为了进行比较,我们还将m-CML和v-CML方法应用于i-vector和x-vector前端。m-CML与i-vector前端结合使用,因为它擅长扩大高斯分数分布的类间距离,而v-CML则与d-vector或x-vector前端结合使用,因为它能够显着减少重尾分数分布的组内方差。在NIST和SITW说话人识别评估语料库上的实验结果表明,所提出的算法优于其初始化通道补偿方法,并且在性能方面与概率线性判别分析后端竞争。为了进行比较,我们还将m-CML和v-CML方法应用于i-vector和x-vector前端。在NIST和SITW说话人识别评估语料库上的实验结果表明,所提出的算法优于其初始化通道补偿方法,并且在性能方面与概率线性判别分析后端竞争。为了进行比较,我们还将m-CML和v-CML方法应用于i-vector和x-vector前端。在NIST和SITW说话人识别评估语料库上的实验结果表明,所提出的算法优于其初始化通道补偿方法,并且在性能方面与概率线性判别分析后端竞争。为了进行比较,我们还将m-CML和v-CML方法应用于i-vector和x-vector前端。

更新日期:2020-02-20
down
wechat
bug