当前位置: X-MOL 学术Assess. Writ. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploring the correspondence between traditional score resolution methods and person fit indices in rater-mediated writing assessments
Assessing Writing ( IF 4.2 ) Pub Date : 2019-01-01 , DOI: 10.1016/j.asw.2018.12.002
Stefanie A. Wind , A. Adrienne Walker

Abstract Scoring procedures for rater-mediated writing assessments often include checks for agreement between the raters who score students’ essays. When raters assign non-adjacent ratings to the same essay, a third rater is often employed to “resolve” the discrepant ratings. The procedures for flagging essays for score resolution are similar to person fit analyses based on item response theory (IRT). We used data from two writing performance assessments in science and social studies to explore the correspondence between traditional score resolution procedures and IRT person fit statistics. We observed that rater agreement criteria and person fit criteria flag many, but not all, of the same rating profiles for additional investigation. We also observed significantly different values of person fit statistics between students whose essays were and were not flagged for third ratings by the rater agreement criteria. Finally, when we used resolved ratings in place of the original ratings, we observed improvements in person fit for most, but not all, of the students whose essays were flagged for third ratings. These results suggest that person fit analyses may provide a complimentary approach to rater agreement criteria. We discuss these results in terms of their implications for research and practice.

中文翻译:

在评分者介导的写作评估中探索传统分数解析方法与个人适合指数之间的对应关系

摘要 评分者介导的写作评估的评分程序通常包括检查评分者之间对学生论文评分的一致性。当评分者为同一篇文章分配不相邻的评分时,通常会聘请第三个评分者来“解决”不一致的评分。标记论文以解决分数的程序类似于基于项目反应理论 (IRT) 的个人适合度分析。我们使用来自科学和社会研究中的两项写作表现评估的数据来探索传统分数解析程序与 IRT 人员匹配统计数据之间的对应关系。我们观察到,评估者一致性标准和个人适合标准标记了许多(但不是全部)相同的评级概况以供进一步调查。我们还观察到,根据评分者协议标准,其论文被标记为第三级评级的学生之间的个人匹配统计值存在显着差异。最后,当我们使用已解决的评分代替原始评分时,我们观察到大多数(但不是全部)论文被标记为第三级的学生在个人适合度方面的改进。这些结果表明,个人适合度分析可以为评估者一致性标准提供一种补充方法。我们讨论这些结果对研究和实践的影响。这些结果表明,个人适合度分析可以为评估者一致性标准提供一种补充方法。我们讨论这些结果对研究和实践的影响。这些结果表明,个人适合度分析可以为评估者一致性标准提供一种补充方法。我们讨论这些结果对研究和实践的影响。
更新日期:2019-01-01
down
wechat
bug