Skip to main content
Article

Comparison of Uni- and Multidimensional Models Applied in Testlet-Based Tests

Published Online:https://doi.org/10.1027/1614-2241/a000137

Abstract. The bifactor model (BM) and the testlet response model (TRM) are the most common multidimensional models applied to testlet-based tests. The common procedure is to estimate these models using different estimation methods (see, e.g., DeMars, 2006). A possible consequence of this is that previous findings about the implications of fitting a wrong model to the data may be confounded with the estimation procedures they employed. With this in mind, the present study uses the same method (maximum marginal likelihood [MML] using dimensional reduction) to compare uni- and multidimensional strategies to testlet-based tests, and assess the performance of various relative fit indices. Data were simulated under three different models, namely BM, TRM, and the unidimensional model. Recovery of item parameters, reliability estimates, and selection rates of the relative fit indices were documented. The results were essentially consistent with those obtained through different methods (DeMars, 2006), indicating that the effect of the estimation method is negligible. Regarding the fit indices, Akaike Information Criterion (AIC) showed the best selection rates, whereas Bayes Information Criterion (BIC) tended to select a model which is simpler than the true one. The work concludes with recommendations for practitioners and proposals for future research.

References

  • Baghaei, P. & Aryadoust, V. (2015). Modeling local item dependence due to common test format with a multidimensional Rasch model. International Journal of Testing, 15, 71–87. https://doi.org/10.1080/15305058.2014.941108 First citation in articleCrossrefGoogle Scholar

  • Baghaei, P. & Ravand, H. (2016). Modeling Local Item Dependence in Cloze and Reading Comprehension Test Items Using Testlet Response Theory. Psicologica: International Journal of Methodology and Experimental Psychology, 37, 85–104. First citation in articleGoogle Scholar

  • Birnbaum, A. (1968). Some latent train models and their use in inferring an examinee’s ability. In F. M. LordM. R. NovickEds., Statistical theories of mental test scores (pp. 395–479). Oxford, UK: Addison-Wesley. First citation in articleGoogle Scholar

  • Bock, R. D., Gibbons, R. D. & Muraki, E. (1988). Full-Information Item Factor Analysis. Applied Psychological Measurement, 12, 261–280. https://doi.org/10.1177/014662168801200305 First citation in articleCrossrefGoogle Scholar

  • Bock, R., Gibbons, R., Schilling, S., Muraki, E., Wilson, D. & Wood, R. (2003). Testfact (Version 4.0) [Computer software and manual]. Lincolnwood, IL: Scientific Software International. First citation in articleGoogle Scholar

  • Bradlow, E. T., Wainer, H. & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168. https://doi.org/10.1007/BF02294533 First citation in articleCrossrefGoogle Scholar

  • Brown, A. & Croudace, T. J. (2015). Scoring and estimating score precision using multidimensional IRT models. In S. P. ReiseD. A. RevickiEds., Handbook of Item Response Theory Modeling: Applications to typical performance assessment (pp. 307–333). New York, NY: Routledge. First citation in articleGoogle Scholar

  • Cai, L. (2010). A two-tier full-information item factor analysis model with applications. Psychometrika, 75, 581–612. https://doi.org/10.1007/s11336-010-9178-0 First citation in articleCrossrefGoogle Scholar

  • Cai, L., Yang, J. S. & Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychological Methods, 16, 221–248. https://doi.org/10.1037/a0023350 First citation in articleCrossrefGoogle Scholar

  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48, 1–29. https://doi.org/10.18637/jss.v048.i06 First citation in articleCrossrefGoogle Scholar

  • Chen, F. F., Hayes, A., Carver, C. S., Laurenceau, J.-P. & Zhang, Z. (2012). Modeling general and specific variance in multifaceted constructs: A comparison of the Bifactor Model to other approaches: Bifactor modeling of multifaceted constructs. Journal of Personality, 80, 219–251. https://doi.org/10.1111/j.1467-6494.2011.00739.x First citation in articleCrossrefGoogle Scholar

  • DeMars, C. E. (2006). Application of the Bi-Factor Multidimensional Item Response Theory Model to testlet-based tests. Journal of Educational Measurement, 43, 145–168. https://doi.org/10.1111/j.1745-3984.2006.00010.x First citation in articleCrossrefGoogle Scholar

  • DeMars, C. E. (2012). Confirming testlet effects. Applied Psychological Measurement, 36, 104–121. https://doi.org/10.1177/0146621612437403 First citation in articleCrossrefGoogle Scholar

  • DeMars, C. E. (2013). A tutorial on interpreting Bifactor Model scores. International Journal of Testing, 13, 354–378. https://doi.org/10.1080/15305058.2013.799067 First citation in articleCrossrefGoogle Scholar

  • Gibbons, R. D. & Hedeker, D. R. (1992). Full-information item bi-factor analysis. Psychometrika, 57, 423–436. https://doi.org/10.1007/BF02295430 First citation in articleCrossrefGoogle Scholar

  • Jiao, H., Wang, S. & He, W. (2013). Estimation methods for one-parameter testlet models. Journal of Educational Measurement, 50, 186–203. https://doi.org/10.1111/jedm.12010 First citation in articleCrossrefGoogle Scholar

  • Li, Y., Bolt, D. M. & Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological Measurement, 30, 3–21. https://doi.org/10.1177/0146621605275414 First citation in articleCrossrefGoogle Scholar

  • Li, Y. & Rupp, A. A. (2011). Performance of the S – χ2 statistic for full-information bifactor models. Educational and Psychological Measurement, 71, 986–1005. https://doi.org/10.1177/0013164410392031 First citation in articleCrossrefGoogle Scholar

  • Morgan, G. B., Hodge, K. J., Wells, K. E. & Watkins, M. W. (2015). Are fit indices biased in favor of bi-factor models in cognitive ability research? A comparison of fit in correlated factors, higher-order, and bi-factor models via Monte Carlo simulations. Journal of Intelligence, 3, 2–20. https://doi.org/10.3390/jintelligence3010002 First citation in articleCrossrefGoogle Scholar

  • Muraki, E. (1992). A Generalized Partial Credit Model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176. https://doi.org/10.1177/014662169201600206 First citation in articleCrossrefGoogle Scholar

  • Reise, S. P., Cook, K. F. & Moore, T. M. (2014). Evaluating the impact of multidimensionality on unidimensional item response theory model parameters. In S. P. ReiseD. A. RevickiEds., Handbook of Item Response Theory Modeling: Applications to typical performance assessment (pp. 13–40). New York, NY: Routledge. First citation in articleGoogle Scholar

  • R Core Team. (2014). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. First citation in articleGoogle Scholar

  • Rijmen, F. (2009). Efficient full information maximum likelihood estimation for multidimensional IRT models. ETS Research Report Series, 2009, i–31. https://doi.org/10.1002/j.2333–8504.2009.tb02160.x First citation in articleGoogle Scholar

  • Rijmen, F. (2010). Formal relations and an empirical comparison among the Bi-Factor, the Testlet, and a Second-Order Multidimensional IRT Model. Journal of Educational Measurement, 47, 361–372. https://doi.org/10.1111/j.1745-3984.2010.00118.x First citation in articleCrossrefGoogle Scholar

  • Rijmen, F., Jeon, M., von Davier, M. & Rabe-Hesketh, S. (2014). A third-order item response theory model for modeling the effects of domains and subdomains in large-scale educational assessment surveys. Journal of Educational and Behavioral Statistics, 39, 235–256. https://doi.org/10.3102/1076998614531045 First citation in articleCrossrefGoogle Scholar

  • Sireci, S. G., Thissen, D. & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237–247. https://doi.org/10.1111/j.1745-3984.1991.tb00356.x First citation in articleCrossrefGoogle Scholar

  • Thissen, D. (2013). Using the Testlet Response Model as a shortcut to Multidimensional Item Response Theory subscore computation. In R. E. MillsapL. A. van der ArkD. M. BoltC. M. WoodsEds., New Developments in Quantitative Psychology (pp. 29–40). New York, NY: Springer. First citation in articleGoogle Scholar

  • Thissen, D., Steinberg, L. & Mooney, J. A. (1989). Trace lines for testlets: A use of Multiple-Categorical-Response Models. Journal of Educational Measurement, 26, 247–260. https://doi.org/10.1111/j.1745-3984.1989.tb00331.x First citation in articleCrossrefGoogle Scholar

  • Wainer, H., Bradlow, E. T. & Du, Z. (2000). Testlet Response Theory: An analog for the 3PL Model useful in testlet-based adaptive testing. In W. J. van der LindenG. A. W. GlasEds., Computerized Adaptive Testing: Theory and Practice (pp. 245–269). Amsterdam, The Netherlands: Springer. First citation in articleGoogle Scholar

  • Wainer, H. & Kiely, G. L. (1987). Item Clusters and Computerized Adaptive Testing: A case for testlets. Journal of Educational Measurement, 24, 185–201. https://doi.org/10.1111/j.1745-3984.1987.tb00274.x First citation in articleCrossrefGoogle Scholar

  • Wainer, H. & Wang, X. (2000). Using a new statistical model for Testlets to score TOEFL. Journal of Educational Measurement, 37, 203–220. https://doi.org/10.1111/j.1745-3984.2000.tb01083.x First citation in articleCrossrefGoogle Scholar

  • Wirth, R. J. & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12, 58–79. https://doi.org/10.1037/1082-989X.12.1.58 First citation in articleCrossrefGoogle Scholar

  • Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145. https://doi.org/10.1177/014662168400800201 First citation in articleCrossrefGoogle Scholar

  • Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187–213. https://doi.org/10.1111/j.1745-3984.1993.tb00423.x First citation in articleCrossrefGoogle Scholar