Assessing L2 English speaking using automated scoring technology: examining automarker reliability,Assessment in Education: Principles, Policy & Practice

当前位置： X-MOL 学术 › Assess. Educ. Princ. Policy Pract. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Assessing L2 English speaking using automated scoring technology: examining automarker reliability
Assessment in Education: Principles, Policy & Practice ( IF 2.7 ) Pub Date : 2021-09-28 , DOI: 10.1080/0969594x.2021.1979467
Jing Xu ₁ , Edmund Jones ₁ , Victoria Laxton ₁ , Evelina Galaczi ₁

Affiliation

ABSTRACT

Recent advances in machine learning have made automated scoring of learner speech widespread, and yet validation research that provides support for applying automated scoring technology to assessment is still in its infancy. Both the educational measurement and language assessment communities have called for greater transparency in describing scoring algorithms and research evidence about the reliability of automated scoring. This paper reports on a study that investigated the reliability of an automarker using candidate responses produced in an online oral English test. Based on ‘limits of agreement’ and multi-faceted Rasch analyses on automarker scores and individual examiner scores, the study found that the automarker, while exhibiting excellent internal consistency, was slightly more lenient than examiner fair average scores, particularly for low-proficiency speakers. Additionally, it was found that an automarker uncertainty measure termed Language Quality, which indicates the confidence of speech recognition, was useful for predicting automarker reliability and flagging abnormal speech.

中文翻译：

使用自动评分技术评估 L2 英语口语：检查自动标记的可靠性

摘要

机器学习的最新进展使学习者语音的自动评分得到广泛应用，但为将自动评分技术应用于评估提供支持的验证研究仍处于起步阶段。教育测量和语言评估社区都呼吁在描述评分算法和有关自动评分可靠性的研究证据方面提高透明度。本文报告了一项研究，该研究使用在线英语口语测试中产生的候选回答来调查自动标记的可靠性。基于“一致性限制”和对自动标记分数和个人审查员分数的多方面 Rasch 分析，该研究发现自动标记虽然表现出出色的内部一致性，但比审查员公平平均分数稍宽松，尤其是对于低水平的演讲者。此外，还发现一种称为语言质量的自动标记不确定性度量，它表示语音识别的置信度，可用于预测自动标记的可靠性和标记异常语音。

更新日期：2021-09-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文