当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic speaker verification from affective speech using Gaussian mixture model based estimation of neutral speech characteristics
Speech Communication ( IF 3.2 ) Pub Date : 2021-05-26 , DOI: 10.1016/j.specom.2021.05.009
Anderson R. Avila , Douglas O’Shaughnessy , Tiago H. Falk

Intra-speaker variability, caused by emotional speech, is a real threat to the performance of speaker recognition systems. In fact, as human beings, we are constantly changing our emotional state. While many efforts have been made to increase automatic speaker verification (ASV) robustness towards channel effects or spoofing attacks, only a handful of studies have addressed the detrimental consequences of affective speech. In this paper, we propose a new method to minimize the mismatch between neutral and affective speech. To this end, a Gaussian mixture model is used to learn a prior probability distribution of the neutral speech for a given speaker (i.e., characterizing his/her source space). This knowledge is then used to minimize the differences between target (affective) and source (neutral) spaces. The proposed method is validated across four multi-lingual emotional datasets. Experimental results show a consistent improvement in performance across eight emotional states, with significant reductions of equal error rate relative to the baseline.



中文翻译:

使用基于高斯混合模型的中性语音特征估计从情感语音中自动验证说话人

由情绪化语音引起的说话人内部可变性是对说话人识别系统性能的真正威胁。事实上,作为人类,我们在不断地改变自己的情绪状态。虽然已经做出了许多努力来提高自动说话人验证 (ASV) 对信道效应或欺骗攻击的鲁棒性,但只有少数研究解决了情感语音的不利后果。在本文中,我们提出了一种新方法来最小化中性语音和情感语音之间的不匹配。为此,使用高斯混合模型来学习给定说话者的中性语音的先验概率分布(即,表征他/她的源空间)。然后使用该知识来最小化目标(情感)和源(中性)空间之间的差异。所提出的方法在四个多语言情感数据集上得到了验证。实验结果表明,在八种情绪状态下的表现持续改善,相对于基线的等错误率显着降低。

更新日期:2021-06-01
down
wechat
bug