当前位置: X-MOL 学术IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Evaluation of Glottal Inverse Filtering Algorithms Using a Physiologically Based Articulatory Speech Synthesizer.
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 5.4 ) Pub Date : 2017-06-12 , DOI: 10.1109/taslp.2017.2714839
Yu-Ren Chien 1 , Daryush D Mehta 2 , Jón Guðnason 1 , Matías Zañartu 3 , Thomas F Quatieri 4
Affiliation  

Glottal inverse filtering aims to estimate the glottal airflow signal from a speech signal for applications such as speaker recognition and clinical voice assessment. Nonetheless, evaluation of inverse filtering algorithms has been challenging due to the practical difficulties of directly measuring glottal airflow. Apart from this, it is acknowledged that the performance of many methods degrade in voice conditions that are of great interest, such as breathiness, high pitch, soft voice, and running speech. This paper presents a comprehensive, objective, and comparative evaluation of state-of-the-art inverse filtering algorithms that takes advantage of speech and glottal airflow signals generated by a physiological speech synthesizer. The synthesizer provides a physics-based simulation of the voice production process and thus an adequate test bed for revealing the temporal and spectral performance characteristics of each algorithm. Included in the synthetic data are continuous speech utterances and sustained vowels, which are produced with multiple voice qualities (pressed, slightly pressed, modal, slightly breathy, and breathy), fundamental frequencies, and subglottal pressures to simulate the natural variations in real speech. In evaluating the accuracy of a glottal flow estimate, multiple error measures are used, including an error in the estimated signal that measures overall waveform deviation, as well as an error in each of several clinically relevant features extracted from the glottal flow estimate. Waveform errors calculated from glottal flow estimation experiments exhibited mean values around 30% for sustained vowels, and around 40% for continuous speech, of the amplitude of true glottal flow derivative. Closed-phase approaches showed remarkable stability across different voice qualities and subglottal pressures. The algorithms of choice, as suggested by significance tests, are closed-phase covariance analysis for the analysis of sustained vowels, and sparse linear prediction for the analysis of continuous speech. Results of data subset analysis suggest that analysis of close rounded vowels is an additional challenge in glottal flow estimation.

中文翻译:

使用基于生理学的发音语音合成器评估声门逆滤波算法。

声门逆滤波旨在从语音信号中估计声门气流信号,用于说话人识别和临床语音评估等应用。尽管如此,由于直接测量声门气流的实际困难,对逆滤波算法的评估一直具有挑战性。除此之外,众所周知,许多方法的性能在非常感兴趣的语音条件下会降低,例如呼吸、高音、柔和的声音和跑步语音。本文对最先进的逆滤波算法进行了全面、客观和比较评估,该算法利用了生理语音合成器生成的语音和声门气流信号。合成器提供语音生成过程的基于物理的模拟,因此提供了一个适当的测试平台,用于揭示每种算法的时间和频谱性能特征。合成数据中包括连续语音和持续元音,它们由多种语音质量(压、微压、模态、微呼吸和呼吸)、基频和声门下压力产生,以模拟真实语音中的自然变化。在评估声门流量估计的准确性时,使用了多种误差度量,包括测量整体波形偏差的估计信号中的误差,以及从声门流量估计中提取的几个临床相关特征中的每一个中的误差。从声门流量估计实验计算出的波形误差显示出真实声门流量导数幅度的持续元音平均值约为 30%,连续语音约为 40%。闭相位方法在不同的声音质量和声门下压力下表现出显着的稳定性。根据显着性测试的建议,选择的算法是用于分析持续元音的闭相协方差分析,以及用于分析连续语音的稀疏线性预测。数据子集分析的结果表明,闭合圆润元音的分析是声门流量估计中的另一个挑战。闭相位方法在不同的声音质量和声门下压力下表现出显着的稳定性。根据显着性测试的建议,选择的算法是用于分析持续元音的闭相协方差分析,以及用于分析连续语音的稀疏线性预测。数据子集分析的结果表明,闭合圆润元音的分析是声门流量估计中的另一个挑战。闭相位方法在不同的声音质量和声门下压力下表现出显着的稳定性。根据显着性测试的建议,选择的算法是用于分析持续元音的闭相协方差分析,以及用于分析连续语音的稀疏线性预测。数据子集分析的结果表明,闭合圆润元音的分析是声门流量估计中的另一个挑战。
更新日期:2017-06-12
down
wechat
bug