当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multimodal perceptual organization of speech: Evidence from tone analogs of spoken utterances.
Speech Communication ( IF 3.2 ) Pub Date : 1998-10-01 , DOI: 10.1016/s0167-6393(98)00050-8
Robert E Remez 1 , Jennifer M Fellowes , David B Pisoni , Winston D Goh , Philip E Rubin
Affiliation  

Theoretical and practical motives alike have prompted recent investigations of multimodal speech perception. Theoretically, multimodal studies have extended the conceptualization of perceptual organization beyond the familiar modality-bound accounts deriving from Gestalt psychology. Practically, such investigations have been driven by a need to understand the proficiency of multimodal speech perception using an electrocochlear prosthesis for hearing. In each domain, studies have shown that perceptual organization of speech can occur even when the perceiver's auditory experience departs from natural speech qualities. Accordingly, our research examined auditor-visual multimodal integration of videotaped faces and selected acoustic constituents of speech signals, each realized as a single sinewave tone accompanying a video image of an articulating face. The single tone reproduced the frequency and amplitude of the phonatory cycle or of one of the lower three oral formants. Our results showed a distinct advantage for the condition pairing the video image of the face with a sinewave replicating the second formant, despite its unnatural timbre and its presentation in acoustic isolation from the rest of the speech signal. Perceptual coherence of multimodal speech in these circumstances is established when the two modalities concurrently specify the same underlying phonetic attributes.

中文翻译:

语音的多模态感知组织:来自口语语音模拟的证据。

理论和实践动机都促使最近对多模态语音感知进行调查。从理论上讲,多模态研究已经将知觉组织的概念化扩展到了源自格式塔心理学的熟悉的模态约束帐户之外。实际上,此类研究的驱动是需要了解使用电耳蜗听力假体的多模态语音感知能力。在每个领域,研究表明,即使感知者的听觉体验偏离自然语音质量,语音的感知组织也会发生。因此,我们的研究检查了录像面部和选定的语音信号声学成分的听觉-视觉多模态整合,每一个都实现为一个单一的正弦波音调,伴随着一张清晰的脸的视频图像。单音再现了发声周期或较低的三个口腔共振峰之一的频率和振幅。我们的结果表明,将面部视频图像与复制第二共振峰的正弦波配对的条件具有明显的优势,尽管它的音色不自然并且与语音信号的其余部分在声学隔离中呈现。当两种模态同时指定相同的基本语音属性时,在这些情况下多模态语音的感知连贯性就建立起来了。我们的结果表明,将面部视频图像与复制第二共振峰的正弦波配对的条件具有明显的优势,尽管它的音色不自然并且与语音信号的其余部分在声学隔离中呈现。当两种模态同时指定相同的基本语音属性时,在这些情况下多模态语音的感知连贯性就建立起来了。我们的结果表明,将面部视频图像与复制第二共振峰的正弦波配对的条件具有明显的优势,尽管它的音色不自然并且与语音信号的其余部分在声学隔离中呈现。当两种模态同时指定相同的基本语音属性时,在这些情况下多模态语音的感知连贯性就建立起来了。
更新日期:2019-11-01
down
wechat
bug