当前位置: X-MOL 学术Comput. Speech Lang › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Speaker verification based on the fusion of speech acoustics and inverted articulatory signals.
Computer Speech & Language ( IF 3.1 ) Pub Date : 2016-03-01 , DOI: 10.1016/j.csl.2015.05.003
Ming Li 1, 2, 3 , Jangwon Kim 4 , Adam Lammert 4 , Prasanta Kumar Ghosh 5 , Vikram Ramanarayanan 4 , Shrikanth Narayanan 4
Affiliation  

We propose a practical, feature-level and score-level fusion approach by combining acoustic and estimated articulatory information for both text independent and text dependent speaker verification. From a practical point of view, we study how to improve speaker verification performance by combining dynamic articulatory information with the conventional acoustic features. On text independent speaker verification, we find that concatenating articulatory features obtained from measured speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the performance dramatically. However, since directly measuring articulatory data is not feasible in many real world applications, we also experiment with estimated articulatory features obtained through acoustic-to-articulatory inversion. We explore both feature level and score level fusion methods and find that the overall system performance is significantly enhanced even with estimated articulatory features. Such a performance boost could be due to the inter-speaker variation information embedded in the estimated articulatory features. Since the dynamics of articulation contain important information, we included inverted articulatory trajectories in text dependent speaker verification. We demonstrate that the articulatory constraints introduced by inverted articulatory features help to reject wrong password trials and improve the performance after score level fusion. We evaluate the proposed methods on the X-ray Microbeam database and the RSR 2015 database, respectively, for the aforementioned two tasks. Experimental results show that we achieve more than 15% relative equal error rate reduction for both speaker verification tasks.

中文翻译:

基于语音声学和反向发音信号融合的说话人验证。

我们提出了一种实用的,功能级别和得分级别的融合方法,通过结合声学和估计的发音信息来进行独立于文本和依赖于文本的说话者验证。从实践的角度来看,我们研究如何通过将动态发音信息与常规声学功能相结合来提高说话者验证性能。在独立于文本的说话人验证上,我们发现将从测得的语音产生数据获得的发音特征与常规的梅尔频率倒谱系数(MFCC)串联起来可以显着提高性能。但是,由于直接测量发音数据在许多实际应用中是不可行的,因此我们还尝试了通过声学-发音反演获得的估算发音特征。我们探索了特征级别和分数级别融合方法,发现即使估计了发音特征,整个系统的性能也得到了显着提高。这样的性能提升可能是由于嵌入在估计的关节特征中的扬声器间变化信息所致。由于发音动态包含重要信息,因此我们在依赖文本的说话者验证中包括了反向发音轨迹。我们证明了反向发音特征引入的发音约束有助于拒绝错误的密码尝试并提高分数级别融合后的性能。对于上述两个任务,我们分别在X射线微束数据库和RSR 2015数据库上评估了建议的方法。
更新日期:2019-11-01
down
wechat
bug