当前位置: X-MOL 学术Stud. Second Lang. Acquis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automated assessment of second language comprehensibility: Review, training, validation, and generalization studies
Studies in Second Language Acquisition ( IF 4.2 ) Pub Date : 2022-03-28 , DOI: 10.1017/s0272263122000080
Kazuya Saito 1 , Konstantinos Macmillan 2 , Magdalena Kachlicka 2 , Takuya Kunihara 3 , Nobuaki Minematsu 3
Affiliation  

Whereas many scholars have emphasized the relative importance of comprehensibility as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners’ judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward using machine learning on spontaneous unscripted speech in speech engineering, the current study examined the possibility of establishing quick and reliable automated comprehensibility assessments. Orchestrating a set of phonological (maximum posterior probabilities and gaps between L1 and L2 speech), prosodic (pitch and intensity variation), and temporal measures (articulation rate, pause frequency), the regression model significantly predicted how naïve listeners intuitively judged low, mid, high, and nativelike comprehensibility among 100 L1 and L2 speakers’ picture descriptions. The strength of the correlation (r = .823 for machine vs. human ratings) was comparable to naïve listeners’ interrater agreement (r = .760 for humans vs. humans). The findings were successfully replicated when the model was applied to a new dataset of 45 L1 and L2 speakers (r = .827) and tested under a more freely constructed interview task condition (r = .809).



中文翻译:

第二语言可理解性的自动评估:审查、培训、验证和泛化研究

尽管许多学者强调了可理解性作为二语语音训练、测试和发展的生态有效目标的相对重要性,但引出听众的判断非常耗时。随着应用语言学中对更有效的 L2 语音评级方法的研究的呼吁,以及对在语音工程中对自发即兴语音使用机器学习的日益关注,当前的研究检验了建立快速可靠的自动化语音的可能性可理解性评估。协调一组语音(最大后验概率和 L1 和 L2 语音之间的差距)、韵律(音高和强度变化)和时间测量(发音率、停顿频率),回归模型显着预测了天真的听众如何直观地判断低、中, high, and nativelike comprehensibility among 100 L1 and L2 speakers' picture description. 相关强度(机器与人类评分的r = .823)与天真的听众的评分者间一致性(人类与人类的r = .760)相当。当将该模型应用于 45 个 L1 和 L2 说话者的新数据集时,这些发现被成功复制 ( r= .827),并在更自由构建的访谈任务条件下进行测试 ( r = .809)。

更新日期:2022-03-28
down
wechat
bug