当前位置: X-MOL 学术Cognit. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sigma-Lognormal Modeling of Speech
Cognitive Computation ( IF 4.3 ) Pub Date : 2021-02-07 , DOI: 10.1007/s12559-020-09803-8
C. Carmona-Duarte , M. A. Ferrer , R. Plamondon , A. Gómez-Rodellar , P. Gómez-Vilda

Human movement studies and analyses have been fundamental in many scientific domains, ranging from neuroscience to education, pattern recognition to robotics, health care to sports, and beyond. Previous speech motor models were proposed to understand how speech movement is produced and how the resulting speech varies when some parameters are changed. However, the inverse approach, in which the muscular response parameters and the subject’s age are derived from real continuous speech, is not possible with such models. Instead, in the handwriting field, the kinematic theory of rapid human movements and its associated Sigma-lognormal model have been applied successfully to obtain the muscular response parameters. This work presents a speech kinematics-based model that can be used to study, analyze, and reconstruct complex speech kinematics in a simplified manner. A method based on the kinematic theory of rapid human movements and its associated Sigma-lognormal model are applied to describe and to parameterize the asymptotic impulse response of the neuromuscular networks involved in speech as a response to a neuromotor command. The method used to carry out transformations from formants to a movement observation is also presented. Experiments carried out with the (English) VTR-TIMIT database and the (German) Saarbrucken Voice Database, including people of different ages, with and without laryngeal pathologies, corroborate the link between the extracted parameters and aging, on the one hand, and the proportion between the first and second formants required in applying the kinematic theory of rapid human movements, on the other. The results should drive innovative developments in the modeling and understanding of speech kinematics.



中文翻译:

语音的Sigma对数正态建模

人体运动的研究和分析在许多科学领域中都是基础知识,从神经科学到教育,模式识别到机器人技术,医疗保健到运动等等。提出了先前的语音运动模型,以理解当某些参数改变时如何产生语音运动以及产生的语音如何变化。但是,这种方法不可能采用反演方法,其中肌肉反应参数和受试者的年龄是从真实的连续语音中得出的。取而代之的是,在手写领域,人类快速运动的运动学理论及其相关的Sigma-lognormal模型已成功地应用于获得肌肉反应参数。这项工作提出了一种基于语音运动学的模型,可用于研究,分析,并以简化的方式重建复杂的语音运动学。一种基于人体快速运动学理论的方法及其相关的Sigma-lognormal模型用于描述和参数化语音中涉及的神经肌肉网络的渐近冲动响应,作为对神经运动命令的响应。还介绍了用于执行从共振峰到运动观察的转换的方法。一方面,使用(英文)VTR-TIMIT数据库和(德国)萨尔布吕肯语音数据库(包括不同年龄的人,有无喉病)进行了实验,一方面证实了提取的参数与衰老之间的联系,另一方面另一方面,应用快速人类运动学理论所需的第一和第二共振峰之间的比例。

更新日期:2021-02-07
down
wechat
bug