Syllable Inference as a Mechanism for Spoken Language Understanding,Topics in Cognitive Science

当前位置： X-MOL 学术 › Topics in Cognitive Science › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Syllable Inference as a Mechanism for Spoken Language Understanding
Topics in Cognitive Science ( IF 2.9 ) Pub Date : 2021-03-29 , DOI: 10.1111/tops.12529
Meredith Brown _{1,

2,

3} , Michael K Tanenhaus _{1,

4} , Laura Dilley ₅

Affiliation

A classic problem in spoken language comprehension is how listeners perceive speech as being composed of discrete words, given the variable time‐course of information in continuous signals. We propose a syllable inference account of spoken word recognition and segmentation, according to which alternative hierarchical models of syllables, words, and phonemes are dynamically posited, which are expected to maximally predict incoming sensory input. Generative models are combined with current estimates of context speech rate drawn from neural oscillatory dynamics, which are sensitive to amplitude rises. Over time, models which result in local minima in error between predicted and recently experienced signals give rise to perceptions of hearing words. Three experiments using the visual world eye‐tracking paradigm with a picture‐selection task tested hypotheses motivated by this framework. Materials were sentences that were acoustically ambiguous in numbers of syllables, words, and phonemes they contained (cf. English plural constructions, such as “saw (a) raccoon(s) swimming,” which have two loci of grammatical information). Time‐compressing, or expanding, speech materials permitted determination of how temporal information at, or in the context of, each locus affected looks to, and selection of, pictures with a singular or plural referent (e.g., one or more than one raccoon). Supporting our account, listeners probabilistically interpreted identical chunks of speech as consistent with a singular or plural referent to a degree that was based on the chunk's gradient rate in relation to its context. We interpret these results as evidence that arriving temporal information, judged in relation to language model predictions generated from context speech rate evaluated on a continuous scale, informs inferences about syllables, thereby giving rise to perceptual experiences of understanding spoken language as words separated in time.

中文翻译：

音节推理作为口语理解机制

口语理解中的一个经典问题是，鉴于连续信号中信息的可变时间进程，听者如何将语音视为由离散单词组成。我们提出了一种口语单词识别和分割的音节推理帐户，根据该帐户动态设置了音节、单词和音素的替代分层模型，有望最大程度地预测传入的感官输入。生成模型与从对振幅上升敏感的神经振荡动力学得出的上下文语速的当前估计相结合。随着时间的推移，导致预测和最近经历的信号之间误差的局部最小值的模型会引起对听到单词的感知。使用视觉世界眼动追踪范式和图片选择任务的三个实验测试了由该框架激发的假设。材料是在它们包含的音节、单词和音素数量上在声学上模棱两可的句子（参见英语复数结构，例如“saw (a) raccoon(s) living”，它具有两个语法信息位点）。时间压缩或扩展语音材料允许确定在每个受影响的地点或在其上下文中的时间信息如何看待和选择具有单数或复数指示物（例如，一个或多个浣熊）的图片. 支持我们的说法，听众可能将相同的语音块解释为与基于块的单数或复数指称一致的程度” s 与其上下文相关的梯度率。我们将这些结果解释为证据，即根据在连续尺度上评估的上下文语速生成的语言模型预测来判断到达的时间信息，告知有关音节的推断，从而产生将口语理解为时间分离的单词的感知体验。

更新日期：2021-04-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文