当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Effects of the piriform fossae, transvelar acoustic coupling, and laryngeal wall vibration on the naturalness of articulatory speech synthesis
Speech Communication ( IF 3.2 ) Pub Date : 2021-06-10 , DOI: 10.1016/j.specom.2021.06.002
Peter Birkholz , Susanne Drechsel

Acoustic models of the vocal tract for articulatory speech synthesis often neglect a range of acoustic effects that are known to exist in the human vocal tract. Here we extended a basic acoustic vocal tract model by three features: the piriform fossae, transvelar acoustic coupling of the oral and nasal cavities, and sound radiation from the skin of the neck. The main goal was to find out how these features affect the naturalness of the synthesized speech. To this end, ten German words were synthesized with different combinations of the additional features, and listeners compared the naturalness of these stimuli. Surprisingly, all three features reduced the perceived naturalness, although they should make the synthesis more realistic. A closer analysis revealed that all new features emphasized the low frequencies compared to the high frequencies of the synthetic speech, leading to slightly more muffled speech with the used glottal excitation. An additional perception experiment with synthetic stimuli with a slightly more tense voice revealed no perceptual preference for the synthesis with or without the piriform fossae. These results indicate that the examined features play a minor role for the naturalness of articulatory synthesis compared to the voice source characteristics.



中文翻译:

梨状窝、跨膜声耦合和喉壁振动对发音语音合成自然度的影响

用于发音语音合成的声道声学模型通常会忽略已知存在于人类声道中的一系列声学效果。在这里,我们通过三个特征扩展了一个基本的声学声道模型:梨状窝、口腔和鼻腔的跨膜声耦合以及来自颈部皮肤的声音辐射。主要目标是找出这些特征如何影响合成语音的自然度。为此,用不同的附加特征组合合成了十个德语单词,听者比较了这些刺激的自然程度。令人惊讶的是,所有三个特征都降低了感知的自然度,尽管它们应该使合成更加逼真。更仔细的分析表明,与合成语音的高频相比,所有新功能都强调低频,导致使用声门激发的语音略有含糊。使用稍微更紧张声音的合成刺激进行的额外感知实验显示,对于有或没有梨状窝的合成没有感知偏好。这些结果表明,与语音源特征相比,所检查的特征对发音合成的自然度起着次要作用。

更新日期:2021-06-17
down
wechat
bug