Leveraging the temporal dynamics of anticipatory vowel-to-vowel coarticulation in linguistic prediction: A statistical modeling approach,Journal of Phonetics

当前位置： X-MOL 学术 › J. Phonet. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Leveraging the temporal dynamics of anticipatory vowel-to-vowel coarticulation in linguistic prediction: A statistical modeling approach
Journal of Phonetics ( IF 2.440 ) Pub Date : 2021-08-25 , DOI: 10.1016/j.wocn.2021.101093
Stefon Flego ₁ , Jon Forrest ₂

Affiliation

Previous research has shown that coarticulatory information in the signal orients listeners in spoken word recognition, and that articulatory and perceptual dynamics closely parallel one another. The current study uses statistical classification to test the power of time-varying anticipatory coarticulatory information present in the acoustic signal for predicting upcoming sounds in the speech stream. Bayesian mixed-effects multinomial logistic regression models were trained on several different representations of spectral variation present in V₁ in order to predict the identity of V₂ in naturally coarticulated transconsonantal V₁…V₂ sequences. Models trained on simple measures of spectral variation (e.g. formant measures taken at V₁ midpoint) were compared with models trained on more sophisticated time-varying representations (e.g. the estimated coefficients of polynomial curves fit to whole formant trajectories of V₁). Accuracy in predicting V₂ was greater when models were trained on dynamic representations of spectral variation in V₁, and those trained on quadratic and cubic polynomial representations achieved the greatest accuracy, with more than 15 percentage points in correct classification over using midpoint formant frequencies alone. The results demonstrate that spectral representations with high temporal resolution capture more disambiguating anticipatory information available in the signal than representations with lower temporal resolution.

中文翻译：

在语言预测中利用预期元音到元音联发音的时间动态：一种统计建模方法

先前的研究表明，信号中的辅音信息在口语识别中引导听众，并且发音和知觉动态彼此密切平行。当前的研究使用统计分类来测试声学信号中存在的时变预期协同发音信息的能力，用于预测语音流中即将出现的声音。贝叶斯混合效应多项逻辑回归模型在 V ₁中存在的频谱变化的几种不同表示上进行了训练，以预测 V ₂在自然协同发音的跨辅音 V ₁ …V ₂序列中的身份。对频谱变化的简单测量训练的模型（例如在 V 处采取的共振峰测量）₁中点）与在更复杂的时变表示（例如拟合到 V _{1 的}整个共振峰轨迹的多项式曲线的估计系数）上训练的模型进行比较。当模型在 V _{1 中}的频谱变化的动态表示上进行训练时，预测 V _{2 的}准确度更高，而那些在二次和三次多项式表示上训练的模型获得了最高的准确度，与单独使用中点共振峰频率相比，正确分类的准确率高出 15 个百分点. 结果表明，与具有较低时间分辨率的表示相比，具有高时间分辨率的频谱表示捕获了信号中可用的更多消除歧义的预期信息。

更新日期：2021-08-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>