Effect of articulatory and acoustic features on the intelligibility of speech in noise: An articulatory synthesis study,Speech Communication

当前位置： X-MOL 学术 › Speech Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Effect of articulatory and acoustic features on the intelligibility of speech in noise: An articulatory synthesis study
Speech Communication ( IF 2.4 ) Pub Date : 2020-01-22 , DOI: 10.1016/j.specom.2020.01.004
Thuanvan Ngo , Masato Akagi , Peter Birkholz

In noisy conditions, speakers involuntarily change their manner of speaking to enhance the intelligibility of their voices. The increased intelligibility of this so-called Lombard speech is enabled by the change of multiple articulatory and acoustic features. While the major features of Lombard speech are well known from previous studies, little is known about their relative contributions to the intelligibility of speech in noise. This study used an analysis-by-synthesis strategy to explore the contributions of multiple of these features. To this end, an articulatory speech synthesizer was used to synthesize the ten German digit words “Null” to “Neun”, for all 16 combinations of four binary features, i.e., modal vs. pressed phonation, normal vs. increased F₁ and F₂ formant frequencies, normal vs. increased f₀ mean and range, and normal vs. increased duration of vowels. Subjects were asked to try to recognize the synthesized words in the presence of strong pink noise and babble noise. Compared to “plain” speech, the word recognition rate was most improved by pressed phonation, followed by an increased f₀ mean and f₀ range, and increased formant frequencies. Increased duration of vowels slightly reduced the recognition rate for pink noise but had no effect for babble noise.

中文翻译：

发音和声学特征对语音中语音清晰度的影响：一项发音合成研究

在嘈杂的环境中，说话者会不由自主地改变说话方式，以增强声音的清晰度。多种发音和声学特征的改变使这种所谓的伦巴底语语音的清晰度提高。尽管伦巴底语语音的主要特征在以前的研究中众所周知，但对它们对噪声中语音清晰度的相对贡献知之甚少。这项研究使用了一种综合分析策略来探索这些功能中多个功能的贡献。为此，针对四个二进制特征的所有16个组合，即模态与按下发声，正常与增加的F ₁和F，使用发音语音合成器来合成十个德语数字单词“ Null”至“ Neun”。₂共振峰频率，正常vs.增加的f ₀均值和范围，以及正常vs.增加的元音持续时间。要求受试者在强烈的粉红色噪声和ba啪声噪声的存在下尝试识别合成词。与“普通”语音相比，通过按下发声可以最大程度地提高单词识别率，其次是增加f ₀平均值和f ₀范围，以及增加共振峰频率。元音持续时间的增加会稍微降低粉红色噪声的识别率，但对ba嗒声没有影响。

更新日期：2020-01-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11