Polytomous Testlet Response Models for Technology-Enhanced Innovative Items: Implications on Model Fit and Trait Inference,Educational and Psychological Measurement

当前位置： X-MOL 学术 › Educ. Psychol. Meas. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Polytomous Testlet Response Models for Technology-Enhanced Innovative Items: Implications on Model Fit and Trait Inference
Educational and Psychological Measurement ( IF 2.7 ) Pub Date : 2021-08-02 , DOI: 10.1177/00131644211032261
Hyeon-Ah Kang ₁ , Suhwa Han ₁ , Doyoung Kim ₂ , Shu-Chuan Kao ₂

Affiliation

The development of technology-enhanced innovative items calls for practical models that can describe polytomous testlet items. In this study, we evaluate four measurement models that can characterize polytomous items administered in testlets: (a) generalized partial credit model (GPCM), (b) testlet-as-a-polytomous-item model (TPIM), (c) random-effect testlet model (RTM), and (d) fixed-effect testlet model (FTM). Using data from GPCM, FTM, and RTM, we examine performance of the scoring models in multiple aspects: relative model fit, absolute item fit, significance of testlet effects, parameter recovery, and classification accuracy. The empirical analysis suggests that relative performance of the models varies substantially depending on the testlet-effect type, effect size, and trait estimator. When testlets had no or fixed effects, GPCM and FTM led to most desirable measurement outcomes. When testlets had random interaction effects, RTM demonstrated best model fit and yet showed substantially different performance in the trait recovery depending on the estimator. In particular, the advantage of RTM as a scoring model was discernable only when there existed strong random effects and the trait levels were estimated with Bayes priors. In other settings, the simpler models (i.e., GPCM, FTM) performed better or comparably. The study also revealed that polytomous scoring of testlet items has limited prospect as a functional scoring method. Based on the outcomes of the empirical evaluation, we provide practical guidelines for choosing a measurement model for polytomous innovative items that are administered in testlets.

中文翻译：

技术增强型创新项目的多分 Testlet 响应模型：对模型拟合和性状推断的影响

技术增强的创新项目的开发需要能够描述多部分测试项目的实用模型。在本研究中，我们评估了四种可以表征测试集中管理的多分项的测量模型：（a）广义部分信用模型（GPCM），（b）测试集作为多分项模型（TPIM），（c）随机-效应测试模型（RTM），和（d）固定效应测试模型（FTM）。使用来自 GPCM、FTM 和 RTM 的数据，我们在多个方面检查评分模型的性能：相对模型拟合、绝对项目拟合、testlet 效果的显着性、参数恢复和分类准确性。实证分析表明，模型的相对性能根据测试集效应类型、效应大小和性状估计量的不同而有很大差异。当测试集没有效果或固定效果时，GPCM 和 FTM 会产生最理想的测量结果。当 testlet 具有随机交互作用时，RTM 表现出最佳的模型拟合，但根据估计器的不同，在性状恢复方面表现出显着不同的性能。特别是，只有当存在强随机效应并且用贝叶斯先验估计特征水平时，RTM 作为评分模型的优势才能显现出来。在其他设置中，更简单的模型（即 GPCM、FTM）表现更好或相当。该研究还表明，测试项目的多级评分作为功能评分方法的前景有限。根据实证评估的结果，我们为在 testlet 中管理的多部分创新项目选择测量模型提供了实用指南。

更新日期：2021-08-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>