Language, Cognition and Neuroscience ( IF 2.3 ) Pub Date : 2021-07-21 , DOI: 10.1080/23273798.2021.1954207 Elnaz Shafaei-Bajestan 1 , Masoumeh Moradipour-Tari 1 , Peter Uhrig 2 , R. Harald Baayen 1
ABSTRACT
A computational model for the comprehension of single spoken words is presented that builds on an earlier model using discriminative learning. Real-valued features are extracted from the speech signal instead of discrete features. Vectors representing word meanings using one-hot encoding are replaced by real-valued semantic vectors. Instead of incremental learning with Rescorla-Wagner updating, we use linear discriminative learning, which captures incremental learning at the limit of experience. These new design features substantially improve prediction accuracy for unseen words, and provide enhanced temporal granularity, enabling the modelling of cohort-like effects. Visualisation with t-SNE shows that the acoustic form space captures phone-like properties. Trained on 9 h of audio from a broadcast news corpus, the model achieves recognition performance that approximates the lower bound of human accuracy in isolated word recognition tasks. LDL-AURIS thus provides a mathematically-simple yet powerful characterisation of the comprehension of single words as found in English spontaneous speech.
中文翻译:
LDL-AURIS:一种基于错误驱动学习的计算模型,用于理解单个口语单词
摘要
提出了一种用于理解单个口语单词的计算模型,该模型建立在使用判别学习的早期模型的基础上。从语音信号中提取实值特征而不是离散特征。使用单热编码表示词义的向量被实值语义向量取代。我们使用线性判别学习,而不是使用 Rescorla-Wagner 更新的增量学习,它在经验的限制下捕获增量学习。这些新的设计特征显着提高了对未见过的词的预测准确性,并提供了增强的时间粒度,从而能够对类群效应进行建模。使用 t-SNE 的可视化显示声学形式空间捕获了类似音素的属性。对来自广播新闻语料库的 9 小时音频进行训练,因此, LDL-AURIS提供了一种数学上简单但功能强大的对英语自发语音中单个单词理解的表征。