Regularized models of audiovisual integration of speech with predictive power for sparse behavioral data,Journal of Mathematical Psychology

当前位置： X-MOL 学术 › J. Math. Psychol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Regularized models of audiovisual integration of speech with predictive power for sparse behavioral data
Journal of Mathematical Psychology ( IF 2.2 ) Pub Date : 2020-09-01 , DOI: 10.1016/j.jmp.2020.102404
Tobias S. Andersen , Ole Winther

Abstract Audiovisual integration can facilitate speech comprehension by integrating information from lip-reading with auditory speech perception. When incongruent acoustic speech is dubbed onto a video of a talking face, this integration can lead to the McGurk illusion of hearing a different phoneme than that spoken by the voice. Several computational models of the information integration process underlying these phenomena exist. All are based on the assumption that the integration process is, in some sense, optimal. They differ, however, in assuming that it is based on either continuous or categorical internal representations. Here we develop models of audiovisual integration of the phonetic information represented on an internal representation that is continuous and cyclical. We compare these models to the Fuzzy Logical Model of Perception (FLMP), which is based on a categorical internal representation. Using cross-validation, we show that model evaluation criteria based on the goodness-of-fit are poor measures of the models’ generalization error even if they take the number of free parameters into account. We also show that the predictive power of all the models benefit from regularization that limits the precision of the internal representation. Finally, we show that, unlike the FLMP, models based on a continuous internal representation have good predictive power when properly regularized.

中文翻译：

具有稀疏行为数据预测能力的语音视听整合的正则化模型

摘要视听整合可以通过将唇读信息与听觉语音感知相结合来促进语音理解。当不一致的声学语音被配成一张说话的脸的视频时，这种整合会导致麦格克错觉，即听到与语音所说的音素不同的音素。存在这些现象背后的信息集成过程的几种计算模型。所有这些都基于整合过程在某种意义上是最优的假设。然而，它们的不同之处在于，它们假设它是基于连续的或分类的内部表示。在这里，我们开发了在连续和循环的内部表示上表示的语音信息的视听集成模型。我们将这些模型与基于分类内部表示的模糊逻辑感知模型 (FLMP) 进行比较。使用交叉验证，我们表明基于拟合优度的模型评估标准即使将自由参数的数量考虑在内，也不能很好地衡量模型的泛化误差。我们还表明，所有模型的预测能力都受益于限制内部表示精度的正则化。最后，我们表明，与 FLMP 不同，基于连续内部表示的模型在适当正则化时具有良好的预测能力。我们表明，即使考虑了自由参数的数量，基于拟合优度的模型评估标准也不能很好地衡量模型的泛化误差。我们还表明，所有模型的预测能力都受益于限制内部表示精度的正则化。最后，我们表明，与 FLMP 不同，基于连续内部表示的模型在适当正则化时具有良好的预测能力。我们表明，即使考虑了自由参数的数量，基于拟合优度的模型评估标准也不能很好地衡量模型的泛化误差。我们还表明，所有模型的预测能力都受益于限制内部表示精度的正则化。最后，我们表明，与 FLMP 不同，基于连续内部表示的模型在适当正则化时具有良好的预测能力。

更新日期：2020-09-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11