当前位置: X-MOL 学术Adv. Health Sci. Educ. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Re-conceptualising and accounting for examiner (cut-score) stringency in a ‘high frequency, small cohort’ performance test
Advances in Health Sciences Education ( IF 3.0 ) Pub Date : 2020-09-02 , DOI: 10.1007/s10459-020-09990-x
Matt Homer

Variation in examiner stringency is an ongoing problem in many performance settings such as in OSCEs, and usually is conceptualised and measured based on scores/grades examiners award. Under borderline regression, the standard within a station is set using checklist/domain scores and global grades acting in combination. This complexity requires a more nuanced view of what stringency might mean when considering sources of variation of cut-scores in stations. This study uses data from 349 administrations of an 18-station, 36 candidate single circuit OSCE for international medical graduates wanting to practice in the UK (PLAB2). The station-level data was gathered over a 34-month period up to July 2019. Linear mixed models are used to estimate and then separate out examiner (n = 547), station (n = 330) and examination (n = 349) effects on borderline regression cut-scores. Examiners are the largest source of variation in cut-scores accounting for 56% of variance in cut-scores, compared to 6% for stations, < 1% for exam and 37% residual. Aggregating to the exam level tends to ameliorate this effect. For 96% of examinations, a 'fair' cut-score, equalising out variation in examiner stringency that candidates experience, is within one standard error of measurement (SEM) of the actual cut-score. The addition of the SEM to produce the final pass mark generally ensures the public is protected from almost all false positives in the examination caused by examiner cut-score stringency acting in candidates' favour.

中文翻译:

在“高频、小批量”性能测试中重新概念化和考虑审查员(分数)的严格性

在 OSCE 等许多绩效环境中,审查员严格性的变化是一个持续存在的问题,通常根据审查员的分数/等级进行概念化和衡量。在边界回归下,站内的标准是使用清单/领域分数和全球等级的组合来设定的。这种复杂性需要更细致地了解在考虑站点切割分数变化来源时的严格性可能意味着什么。本研究使用来自 18 个站点、36 个候选单回路 OSCE 的 349 个管理部门的数据,供希望在英国实习的国际医学毕业生 (PLAB2) 使用。台站级数据是在截至 2019 年 7 月的 34 个月期间收集的。使用线性混合模型估计然后分离出审查员(n = 547),站 (n = 330) 和检查 (n = 349) 对边界回归分数的影响。考官是分数变异的最大来源,占分数变异的 56%,而工作站为 6%,考试小于 1%,剩余 37%。汇总到考试级别往往会改善这种影响。对于 96% 的考试,“公平”的分数线(平衡考生经历的考官严格程度的变化)在实际分数线的一个标准测量误差 (SEM) 内。添加 SEM 以产生最终及格分数通常可以确保公众免受考试中几乎所有由于考官对考生有利的分数严格而导致的误报。考官是分数变异的最大来源,占分数变异的 56%,而工作站为 6%,考试小于 1%,剩余 37%。汇总到考试级别往往会改善这种影响。对于 96% 的考试,“公平”的分数线(平衡考生经历的考官严格程度的变化)在实际分数线的一个标准测量误差 (SEM) 内。添加 SEM 以产生最终及格分数通常可以确保公众免受考试中几乎所有由于考官对考生有利的分数严格而导致的误报。考官是分数变异的最大来源,占分数变异的 56%,而工作站为 6%,考试小于 1%,剩余 37%。汇总到考试级别往往会改善这种影响。对于 96% 的考试,“公平”的分数线(平衡考生经历的考官严格程度的变化)在实际分数线的一个标准测量误差 (SEM) 内。添加 SEM 以产生最终及格分数通常可以确保公众免受考试中几乎所有由于考官对考生有利的分数严格而导致的误报。平衡考生所经历的考官严格程度的变化,在实际得分的一个标准测量误差 (SEM) 内。添加 SEM 以产生最终及格分数通常可以确保公众免受考试中几乎所有由于考官对考生有利的分数严格而导致的误报。平衡考生所经历的考官严格程度的变化,在实际得分的一个标准测量误差 (SEM) 内。添加 SEM 以产生最终及格分数通常可以确保公众免受考试中几乎所有由于考官对考生有利的分数严格而导致的误报。
更新日期:2020-09-02
down
wechat
bug