Evaluating Item Fit Statistic Thresholds in PISA: Analysis of Cross-Country Comparability of Cognitive Items,Educational Measurement: Issues and Practice

当前位置： X-MOL 学术 › Educational Measurement: Issues and Practice › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Evaluating Item Fit Statistic Thresholds in PISA: Analysis of Cross-Country Comparability of Cognitive Items
Educational Measurement: Issues and Practice ( IF 2.7 ) Pub Date : 2020-11-24 , DOI: 10.1111/emip.12404
Seang‐Hwane Joo ₁ , Lale Khorramdel ₂ , Kentaro Yamamoto ₁ , Hyo Jeong Shin ₁ , Frederic Robin ₁

Affiliation

In Programme for International Student Assessment (PISA), item response theory (IRT) scaling is used to examine the psychometric properties of items and scales and to provide comparable test scores across participating countries and over time. To balance the comparability of IRT item parameter estimations across countries with the best possible model fit, a partial invariance approach is used in PISA. In this approach, international or common item parameters are estimated for the majority of items, while unique or country-specific item parameters are allowed for item-country combinations where a misfit to the common parameters can be identified. The goal of the current study is to establish item fit statistic thresholds for identifying such misfits. We investigated the impact of various thresholds on scale and score estimation. To evaluate the impact of various item fit thresholds, we systematically examined the number of unique item parameters and country performance distributions and compared the overall model fit statistics using data from PISA 2015 and 2018. Results showed that RMSD = .10 provides the best fitting model while still establishing stable parameter estimations and sufficient comparability across groups. The applications and implications of the results are discussed.

中文翻译：

评估 PISA 中的项目拟合统计阈值：认知项目的跨国可比性分析

在国际学生评估计划 (PISA) 中，项目反应理论 (IRT) 量表用于检查项目和量表的心理测量特性，并提供跨参与国和随时间推移的可比测试分数。为了平衡各国 IRT 项目参数估计的可比性和最佳模型拟合，PISA 中使用了部分不变性方法。在这种方法中，大多数项目的国际或通用项目参数是估计的，而项目-国家组合允许使用独特或国家特定的项目参数，其中可以识别出与公共参数不匹配的情况。当前研究的目标是建立项目匹配统计阈值以识别此类不匹配。我们调查了各种阈值对规模和分数估计的影响。为了评估各种项目拟合阈值的影响，我们系统地检查了独特项目参数的数量和国家/地区绩效分布，并使用 PISA 2015 和 2018 年的数据比较了整体模型拟合统计数据。结果表明，RMSD = .10 提供了最佳拟合模型同时仍然建立稳定的参数估计和足够的组间可比性。讨论了结果的应用和影响。

更新日期：2020-11-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文