Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Similarity of the cut score in test sets with different item amounts using the modified Angoff, modified Ebel, and Hofstee standard-setting methods for the Korean Medical Licensing Examination
Journal of Educational Evaluation for Health Professions Pub Date : 2020-10-05 , DOI: 10.3352/jeehp.2020.17.28
Janghee Park 1 , Mi Kyoung Yim 2 , Na Jin Kim 3 , Duck Sun Ahn 4 , Young-Min Kim 3
Affiliation  

PURPOSE The Korea Medical Licensing Exam (KMLE) typically contains a large number of items. The purpose of this study was to investigate whether there is a difference in the cut score between evaluating all items of the exam and evaluating only some items when conducting standard-setting. METHODS We divided the item sets that appeared on 3 recent KMLEs for the past 3 years into 4 subsets of each year of 25% each based on their item content categories, discrimination index, and difficulty index. The entire panel of 15 members assessed all the items (360 items, 100%) of the year 2017. In split-half set 1, each item set contained 184 (51%) items of year 2018 and each set from split-half set 2 contained 182 (51%) items of the year 2019 using the same method. We used the modified Angoff, modified Ebel, and Hofstee methods in the standard-setting process. RESULTS Less than a 1% cut score difference was observed when the same method was used to stratify item subsets containing 25%, 51%, or 100% of the entire set. When rating fewer items, higher rater reliability was observed. CONCLUSION When the entire item set was divided into equivalent subsets, assessing the exam using a portion of the item set (90 out of 360 items) yielded similar cut scores to those derived using the entire item set. There was a higher correlation between panelists' individual assessments and the overall assessments.

中文翻译:

使用修改后的 Angoff、修改后的 Ebel 和 Hofstee 韩国医学执照考试标准制定方法,在不同项目数量的测试集中分数的相似性

目的 韩国医学执照考试 (KMLE) 通常包含大量项目。本研究的目的是调查在评估考试的所有项目和在进行标准制定时仅评估某些项目之间的分数是否存在差异。方法 我们将过去 3 年出现在最近 3 个 KMLE 上的项目集根据它们的项目内容类别、区分指数和难度指数划分为每年 25% 的 4 个子集。由 15 名成员组成的整个小组评估了 2017 年的所有项目(360 个项目,100%)。在分半集 1 中,每个项目集包含 2018 年的 184 个(51%)项目,每个集来自分半集2 使用相同的方法包含 2019 年的 182 (51%) 个项目。我们使用了改良的 Angoff,改良的 Ebel,标准制定过程中的 Hofstee 方法。结果 当使用相同的方法对包含整个集合的 25%、51% 或 100% 的项目子集进行分层时,观察到小于 1% 的得分差异。当对较少的项目进行评分时,观察到较高的评分者可靠性。结论当整个项目集被划分为等效的子集时,使用项目集的一部分(360 个项目中的 90 个)评估考试产生的分数与使用整个项目集得出的分数相似。小组成员的个人评估与整体评估之间存在更高的相关性。结论当整个项目集被划分为等效的子集时,使用项目集的一部分(360 个项目中的 90 个)评估考试产生的分数与使用整个项目集得出的分数相似。小组成员的个人评估与整体评估之间存在更高的相关性。结论当整个项目集被分成等效的子集时,使用项目集的一部分(360 个项目中的 90 个)评估考试产生的分数与使用整个项目集得出的分数相似。小组成员的个人评估与整体评估之间存在更高的相关性。
更新日期:2020-10-05
down
wechat
bug