当前位置: X-MOL 学术J. Sci. Educ. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Using Machine Learning to Score Multi-Dimensional Assessments of Chemistry and Physics
Journal of Science Education and Technology ( IF 4.4 ) Pub Date : 2021-03-26 , DOI: 10.1007/s10956-020-09895-9
Sarah Maestrales , Xiaoming Zhai , Israel Touitou , Quinton Baker , Barbara Schneider , Joseph Krajcik

In response to the call for promoting three-dimensional science learning (NRC, 2012), researchers argue for developing assessment items that go beyond rote memorization tasks to ones that require deeper understanding and the use of reasoning that can improve science literacy. Such assessment items are usually performance-based constructed responses and need technology involvement to ease the burden of scoring placed on teachers. This study responds to this call by examining the use and accuracy of a machine learning text analysis protocol as an alternative to human scoring of constructed response items. The items we employed represent multiple dimensions of science learning as articulated in the 2012 NRC report. Using a sample of over 26,000 constructed responses taken by 6700 students in chemistry and physics, we trained human raters and compiled a robust training set to develop machine algorithmic models and cross-validate the machine scores. Results show that human raters yielded good (Cohen’s k = .40–.75) to excellent (Cohen’s k > .75) interrater reliability on the assessment items with varied numbers of dimensions. A comparison reveals that the machine scoring algorithms achieved comparable scoring accuracy to human raters on these same items. Results also show that responses with formal vocabulary (e.g., velocity) were likely to yield lower machine-human agreements, which may be associated with the fact that fewer students employed formal phrases compared with the informal alternatives.



中文翻译:

使用机器学习对化学和物理的多维评估进行评分

为了响应促进三维科学学习的呼吁(NRC,2012),研究人员主张开发评估项目,这些任务不仅限于死记硬背的任务,还包括需要更深入理解和运用推理来提高科学素养的任务。此类评估项目通常是基于绩效的构造反应,需要技术参与以减轻教师的评分负担。这项研究通过检查机器学习文本分析协议的使用和准确性来替代此要求,该协议可替代人工对已构建的响应项目进行评分。我们采用的项目代表了2012年NRC报告中阐明的科学学习的多个维度。使用6700名化学和物理专业学生对26,000多个构造的反馈进行抽样,我们训练了人工评估者,并编制了一套强大的训练集来开发机器算法模型并交叉验证机器得分。结果表明,人类评估者产生了良好的效果(Cohen'sk  = .40–.75)到 具有不同维度数量的评估项目上极佳的(Cohen's k > .75)间隔间可靠性。比较表明,机器评分算法在这些相同项目上的评分准确度可与人类评分者相媲美。结果还表明,使用正式词汇(例如,速度)的回答可能会产生较低的机器人际协议,这可能与以下事实有关:与非正式选择相比,使用正式短语的学生更少。

更新日期:2021-03-26
down
wechat
bug