当前位置: X-MOL 学术J. Educ. Behav. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Bias-Corrected RMSD Item Fit Statistic: An Evaluation and Comparison to Alternatives
Journal of Educational and Behavioral Statistics ( IF 2.116 ) Pub Date : 2019-12-19 , DOI: 10.3102/1076998619890566
Carmen Köhler 1 , Alexander Robitzsch 2, 3 , Johannes Hartig 1
Affiliation  

Testing whether items fit the assumptions of an item response theory model is an important step in evaluating a test. In the literature, numerous item fit statistics exist, many of which show severe limitations. The current study investigates the root mean squared deviation (RMSD) item fit statistic, which is used for evaluating item fit in various large-scale assessment studies. The three research questions of this study are (1) whether the empirical RMSD is an unbiased estimator of the population RMSD; (2) if this is not the case, whether this bias can be corrected; and (3) whether the test statistic provides an adequate significance test to detect misfitting items. Using simulation studies, it was found that the empirical RMSD is not an unbiased estimator of the population RMSD, and nonparametric bootstrapping falls short of entirely eliminating this bias. Using parametric bootstrapping, however, the RMSD can be used as a test statistic that outperforms the other approaches—infit and outfit, S − X 2—with respect to both Type I error rate and power. The empirical application showed that parametric bootstrapping of the RMSD results in rather conservative item fit decisions, which suggests more lenient cut-off criteria.

中文翻译:

偏差校正的RMSD项目拟合统计:评估和替代方案的比较

测试项目是否符合项目响应理论模型的假设是评估测试的重要步骤。在文献中,存在大量的项目拟合统计,其中许多显示出严重的局限性。本研究调查了均方根偏差(RMSD)项目拟合统计量,该均方根用于评估各种大规模评估研究中的项目拟合度。本研究的三个研究问题是:(1)经验RMSD是否是人口RMSD的无偏估计量;(2)如果不是这种情况,是否可以纠正这种偏差;(3)检验统计量是否提供了足够的显着性检验以检测不匹配项。通过模拟研究发现,经验RMSD并非人口RMSD的无偏估计量,非参数自举无法完全消除这种偏差。但是,使用参数自举,就I型错误率和功耗而言,RMSD可以用作测试统计量,其胜过其他方法(合身和服装S-X 2)。经验应用表明,RMSD的参数自举导致相当保守的项目拟合决策,这表明更宽松的截止标准。
更新日期:2019-12-19
down
wechat
bug