COSMO-Onset: A neurally-inspired computational model of spoken word recognition, combining top-down prediction and bottom-up detection of syllabic onsets,Frontiers in Systems Neuroscience

当前位置： X-MOL 学术 › Front. Syst. Neurosci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

COSMO-Onset: A neurally-inspired computational model of spoken word recognition, combining top-down prediction and bottom-up detection of syllabic onsets
Frontiers in Systems Neuroscience ( IF 3.1 ) Pub Date : 2021-07-02 , DOI: 10.3389/fnsys.2021.653975
Mamady Nabé _{1,

2} , Jean-Luc Schwartz ₁ , Julien Diard ₂

Affiliation

Recent neurocognitive models commonly consider speech perception as a hierarchy of processes, each corresponding to specific temporal scales of collective oscillatory processes in the cortex: 30-80 Hz gamma oscillations in charge of phonetic analysis, 4-9 Hz theta oscillations in charge of syllabic segmentation, 1-2 Hz delta oscillations processing prosodic/syntactic units and the 15-20 Hz beta channel possibly involved in top-down predictions. Several recent neuro-computational models thus feature theta oscillations, driven by the speech acoustic envelope, to achieve syllabic parsing before lexical access. However, it is unlikely that such syllabic parsing, performed in a purely bottom-up manner from envelope variations, would be totally efficient in all situations, especially in adverse sensory conditions. We present a new probabilistic model of spoken word recognition, called COSMO-Onset, in which syllabic parsing relies on fusion between top-down, lexical prediction of onset events and bottom-up onset detection from the acoustic envelope. We report preliminary simulations, analyzing how the model performs syllabic parsing and phone, syllable and word recognition. We show that, while purely bottom-up onset detection is sufficient for word recognition in nominal conditions, top-down prediction of syllabic onset events allows overcoming challenging adverse conditions, such as when the acoustic envelope is degraded, leading either to spurious or missing onset events in the sensory signal. This provides a proposal for a possible computational functional role of top-down, predictive processes during speech recognition, consistent with recent models of neuronal oscillatory processes.

中文翻译：

COSMO-Onset：一种受神经启发的口语单词识别计算模型，结合自上而下的预测和自下而上的音节开始检测

最近的神经认知模型通常将语音感知视为一系列过程，每个过程对应于皮层中集体振荡过程的特定时间尺度：负责语音分析的 30-80 Hz 伽马振荡，负责音节分割的 4-9 Hz theta 振荡, 1-2 Hz delta 振荡处理韵律/句法单元和 15-20 Hz beta 通道可能参与自上而下的预测。因此，最近的几个神经计算模型具有由语音包络驱动的 theta 振荡，以在词汇访问之前实现音节解析。然而，这种以完全自下而上的方式从包络变化中执行的音节解析不太可能在所有情况下都完全有效，尤其是在不利的感官条件下。我们提出了一种新的口语单词识别概率模型，称为 COSMO-Onset，其中音节解析依赖于自上而下的起始事件词汇预测和自下而上的声学包络起始检测之间的融合。我们报告了初步模拟，分析了模型如何执行音节解析以及音素、音节和单词识别。我们表明，虽然纯粹自下而上的起始检测足以在名义条件下进行单词识别，但自上而下的音节起始事件预测允许克服具有挑战性的不利条件，例如当声包络退化时，导致虚假或缺失起始感觉信号中的事件。这为语音识别期间自上而下的预测过程的可能计算功能作用提供了建议，

更新日期：2021-07-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文