当前位置: X-MOL 学术J. Sci. Educ. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparison of Machine Learning Performance Using Analytic and Holistic Coding Approaches Across Constructed Response Assessments Aligned to a Science Learning Progression
Journal of Science Education and Technology ( IF 3.3 ) Pub Date : 2020-09-26 , DOI: 10.1007/s10956-020-09858-0
Lauren N. Jescovitch , Emily E. Scott , Jack A. Cerchiara , John Merrill , Mark Urban-Lurain , Jennifer H. Doherty , Kevin C. Haudek

We systematically compared two coding approaches to generate training datasets for machine learning (ML): (i) a holistic approach based on learning progression levels and (ii) a dichotomous, analytic approach of multiple concepts in student reasoning, deconstructed from holistic rubrics. We evaluated four constructed response assessment items for undergraduate physiology, each targeting five levels of a developing flux learning progression in an ion context. Human-coded datasets were used to train two ML models: (i) an 8-classification algorithm ensemble implemented in the Constructed Response Classifier (CRC), and (ii) a single classification algorithm implemented in LightSide Researcher’s Workbench. Human coding agreement on approximately 700 student responses per item was high for both approaches with Cohen’s kappas ranging from 0.75 to 0.87 on holistic scoring and from 0.78 to 0.89 on analytic composite scoring. ML model performance varied across items and rubric type. For two items, training sets from both coding approaches produced similarly accurate ML models, with differences in Cohen’s kappa between machine and human scores of 0.002 and 0.041. For the other items, ML models trained with analytic coded responses and used for a composite score, achieved better performance as compared to using holistic scores for training, with increases in Cohen’s kappa of 0.043 and 0.117. These items used a more complex scenario involving movement of two ions. It may be that analytic coding is beneficial to unpacking this additional complexity.



中文翻译:

使用分析和整体编码方法在根据科学学习进展而进行的构造响应评估中比较机器学习性能

我们系统地比较了两种编码方法来生成机器学习(ML)的训练数据集:(i)基于学习进度水平的整体方法,(ii)从整体规则中解构出来的学生推理中多个概念的二分法,分析方法。我们评估了四个构建的针对本科生生理学的反应评估项目,每个项目针对的是离子环境中发展中的通量学习进展的五个水平。使用人类编码的数据集来训练两个ML模型:(i)在构造响应分类器(CRC)中实现的8分类算法集合,以及(ii)在LightSide Researcher's Workbench中实现的单个分类算法。两种方法中,每项大约700个学生响应的人类编码协议都很高,科恩的kappas从0.75到0。整体评分为87,分析综合评分为0.78至0.89。机器学习模型的性能随项目和专栏类型的不同而不同。对于两个项目,两种编码方法的训练集都产生了相似的准确ML模型,机器和人类得分之间的Cohen卡伯值之差分别为0.002和0.041。对于其他项目,与使用整体评分进行训练相比,使用解析编码响应进行训练并用于综合评分的ML模型获得了更好的性能,Cohen的kappa值分别增加了0.043和0.117。这些项目使用了涉及两个离子运动的更复杂的方案。解析编码可能有益于解开这种额外的复杂性。两种编码方法的训练集都产生了相似准确的ML模型,机器和人类得分之间的科恩卡伯值之差分别为0.002和0.041。对于其他项目,与使用整体评分进行训练相比,使用解析编码响应进行训练并用于综合评分的ML模型获得了更好的性能,Cohen的kappa值分别增加了0.043和0.117。这些项目使用了涉及两个离子运动的更复杂的方案。解析编码可能有益于解开这种额外的复杂性。两种编码方法的训练集都产生了相似准确的ML模型,机器和人类得分之间的科恩卡伯值之差分别为0.002和0.041。对于其他项目,与使用整体评分进行训练相比,使用解析编码响应进行训练并用于综合评分的ML模型获得了更好的性能,Cohen的kappa值分别增加了0.043和0.117。这些项目使用了涉及两个离子运动的更复杂的方案。解析编码可能有益于解开这种额外的复杂性。与使用整体得分进行训练相比,Cohen的kappa值提高了0.043和0.117,与使用整体得分进行训练相比,它的表现更好。这些项目使用了涉及两个离子运动的更复杂的方案。解析编码可能有益于解开这种额外的复杂性。与使用整体得分进行训练相比,Cohen的kappa值提高了0.043和0.117,与使用整体得分进行训练相比,您获得了更好的表现。这些项目使用了涉及两个离子运动的更复杂的方案。解析编码可能有益于解开这种额外的复杂性。

更新日期:2020-09-26
down
wechat
bug