当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Do 'Dominant Frequencies' explain the listener's response to formant and spectrum shape variations?
Speech Communication ( IF 3.2 ) Pub Date : 2008-12-24 , DOI: 10.1016/j.specom.2008.12.003
Björn Lindblom 1 , Randy Diehl , Carl Creeger
Affiliation  

Psychoacoustic experimentation shows that formant frequency shifts can give rise to more significant changes in phonetic vowel timber than differences in overall level, bandwidth, spectral tilt, and formant amplitudes. Carlson and Granström’s perceptual and computational findings suggest that, in addition to spectral representations, the human ear uses temporal information on formant periodicities (‘Dominant Frequencies’) in building vowel timber percepts. The availability of such temporal coding in the cat’s auditory nerve fibers has been demonstrated in numerous physiological investigations undertaken during recent decades. In this paper we explore, and provide further support for, the Dominant Frequency hypothesis using KONVERT, a computational auditory model. KONVERT provides auditory excitation patterns for vowels by performing a critical-band analysis. It simulates phase locking in auditory neurons and outputs DF histograms. The modeling supports the assumption that listeners judge phonetic distance among vowels on the basis formant frequency differences as determined primarily by a time-based analysis. However, when instructed to judge psychophysical distance among vowels, they can also use spectral differences such as formant bandwidth, formant amplitudes and spectral tilt. Although there has been considerable debate among psychoacousticians about the functional role of phase locking in monaural hearing, the present research suggests that detailed temporal information may nonetheless play a significant role in speech perception.



中文翻译:

“主导频率”是否解释了听众对共振峰和频谱形状变化的反应?

心理声学实验表明,与整体电平、带宽、频谱倾斜和共振峰振幅的差异相比,共振峰频移可以引起语音元音木材的更显着变化。Carlson 和 Granström 的感知和计算发现表明,除了频谱表示外,人耳还使用共振峰周期(“主频”)的时间信息来构建元音木材感知。近几十年来进行的许多生理学研究已经证明了猫的听觉神经纤维中这种时间编码的可用性。在本文中,我们使用计算听觉模型 KONVERT 探索并进一步支持主频假设。KONVERT 通过执行临界带分析为元音提供听觉激发模式。它模拟听觉神经元中的锁相并输出 DF 直方图。该模型支持听众判断的假设元音之间的语音距离基于共振峰频率差异,主要由基于时间的分析确定。然而,当被指示判断元音之间的心理物理距离时,他们也可以使用频谱差异,如共振峰带宽、共振峰幅度和频谱倾斜。尽管心理声学家对锁相在单耳听力中的功能作用存在相当大的争论,但目前的研究表明,详细的时间信息可能仍然在语音感知中发挥重要作用。

更新日期:2008-12-24
down
wechat
bug