当前位置: X-MOL 学术Adv. Health Sci. Educ. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accuracy of rating scale interval values used in multiple mini-interviews: a mixed methods study
Advances in Health Sciences Education ( IF 3.0 ) Pub Date : 2020-05-06 , DOI: 10.1007/s10459-020-09970-1
Philippe Bégin , Robert Gagnon , Jean-Michel Leduc , Béatrice Paradis , Jean-Sébastien Renaud , Jacinthe Beauchamp , Richard Rioux , Marie-Pier Carrier , Claire Hudon , Marc Vautour , Annie Ouellet , Martine Bourget , Christian Bourdy

When determining the score given to candidates in multiple mini-interview (MMI) stations, raters have to translate a narrative judgment to an ordinal rating scale. When adding individual scores to calculate final ranking, it is generally presumed that the values of possible scores on the evaluation grid are separated by constant intervals, following a linear function, although this assumption is seldom validated with raters themselves. Inaccurate interval values could lead to systemic bias that could potentially distort candidates’ final cumulative scores. The aim of this study was to establish rating scale values based on rater’s intent, to validate these with an independent quantitative method, to explore their impact on final score, and to appraise their meaning according to experienced MMI interviewers. A 4-round consensus-group exercise was independently conducted with 42 MMI interviewers who were asked to determine relative values for the 6-point rating scale (from A to F) used in the Canadian integrated French MMI (IFMMI). In parallel, relative values were also calculated for each option of the scale by comparing the average scores concurrently given to the same individual in other stations every time that option was selected during three consecutive IFMMI years. Data from the same three cohorts was used to simulate the impact of using new score values on final rankings. Comments from the consensus group exercise were reviewed independently by two authors to explore raters’ rationale for choosing specific values. Relative to the maximum (A = 100%) and minimum (F = 0%), experienced raters concluded to values of 86.7% (95% CI 86.3–87.1), 69.5% (68.9–70.1), 51.2% (50.6–51.8), and 29.3% (28.1–30.5), for scores of B, C, D and E respectively. The concurrent score approach was based on 43,412 IFMMI stations performed by 4345 medical school applicants. It provided quasi-identical values of 87.1% (82.4–91.5), 70.4% (66.1–74.7), 51.2% (47.1–55.3) and 31.8% (27.9–35.7), respectively. Qualitative analysis explained that while high scores are usually based on minor details of relatively low importance, low scores are usually attributed for more serious offenses and were assumed by the raters to carry more weight in the final score. Individual drop or increase in final MMI ranking with the use of new scale values ranged from − 21 to + 5 percentiles, with the average candidate changing by ± 1.4 percentiles. Consulting with experienced interviewers is a simple and effective approach to establish rating scale values that truly reflects raters’ intent in MMI, thus improving the accuracy of the instrument and contributing to the general fairness of the process.

中文翻译:

多次小型访谈中使用的评分量表区间值的准确性:混合方法研究

在确定在多个小型面试 (MMI) 站中给候选人的分数时,评分者必须将叙述性判断转化为顺序评分量表。当添加单个分数来计算最终排名时,通常假设评估网格上可能分数的值按照线性函数以恒定间隔分开,尽管这种假设很少得到评估者本身的验证。不准确的区间值可能会导致系统性偏差,从而可能会扭曲候选人的最终累积分数。本研究的目的是根据评分者的意图建立评分量表值,用独立的定量方法验证这些值,探索它们对最终分数的影响,并根据经验丰富的 MMI 面试官评估它们的含义。42 名 MMI 访谈员独立进行了 4 轮共识小组练习,他们被要求确定加拿大综合法语 MMI (IFMMI) 中使用的 6 点评分量表(从 A 到 F)的相对值。同时,还通过比较在连续三个 IFMMI 年中每次选择该选项时在其他站点同时给予同一个人的平均分数来计算该量表的每个选项的相对值。来自相同三个队列的数据用于模拟使用新分值对最终排名的影响。来自共识小组练习的评论由两位作者独立审查,以探讨评估者选择特定值的理由。相对于最大值 (A = 100%) 和最小值 (F = 0%),经验丰富的评估者得出的结论为 86.7% (95% CI 86.3–87.1), 69。B、C、D 和 E 的分数分别为 5% (68.9–70.1)、51.2% (50.6–51.8) 和 29.3% (28.1–30.5)。并发评分方法基于 4345 名医学院申请者执行的 43,412 个 IFMMI 站。它提供的准相同值分别为 87.1% (82.4–91.5)、70.4% (66.1–74.7)、51.2% (47.1–55.3) 和 31.8% (27.9–35.7)。定性分析解释说,虽然高分通常基于重要性相对较低的次要细节,但低分通常归因于更严重的违规行为,并被评估者认为在最终得分中具有更大的权重。使用新比例值在最终 MMI 排名中的个人下降或增加范围从 - 21 到 + 5 个百分点,平均候选人变化 ± 1.4 个百分点。
更新日期:2020-05-06
down
wechat
bug