Article

Comparison of Uni- and Multidimensional Models Applied in Testlet-Based Tests

Alejandro Hernandez-Camacho

Department of Social Psychology and Methodology, Universidad Autónoma de Madrid, Spain

Search for more papers by this author

Julio Olea

Department of Social Psychology and Methodology, Universidad Autónoma de Madrid, Spain

Search for more papers by this author

, and

Francisco J. Abad

Department of Social Psychology and Methodology, Universidad Autónoma de Madrid, Spain

Search for more papers by this author

Published Online:November 24, 2017https://doi.org/10.1027/1614-2241/a000137

Abstract

Abstract. The bifactor model (BM) and the testlet response model (TRM) are the most common multidimensional models applied to testlet-based tests. The common procedure is to estimate these models using different estimation methods (see, e.g., DeMars, 2006). A possible consequence of this is that previous findings about the implications of fitting a wrong model to the data may be confounded with the estimation procedures they employed. With this in mind, the present study uses the same method (maximum marginal likelihood [MML] using dimensional reduction) to compare uni- and multidimensional strategies to testlet-based tests, and assess the performance of various relative fit indices. Data were simulated under three different models, namely BM, TRM, and the unidimensional model. Recovery of item parameters, reliability estimates, and selection rates of the relative fit indices were documented. The results were essentially consistent with those obtained through different methods (DeMars, 2006), indicating that the effect of the estimation method is negligible. Regarding the fit indices, Akaike Information Criterion (AIC) showed the best selection rates, whereas Bayes Information Criterion (BIC) tended to select a model which is simpler than the true one. The work concludes with recommendations for practitioners and proposals for future research.

References

Baghaei, P. & Aryadoust, V. (2015). Modeling local item dependence due to common test format with a multidimensional Rasch model. International Journal of Testing, 15, 71–87. https://doi.org/10.1080/15305058.2014.941108 First citation in article Crossref, Google Scholar
Baghaei, P. & Ravand, H. (2016). Modeling Local Item Dependence in Cloze and Reading Comprehension Test Items Using Testlet Response Theory. Psicologica: International Journal of Methodology and Experimental Psychology, 37, 85–104. First citation in article Google Scholar
Birnbaum, A. (1968). Some latent train models and their use in inferring an examinee’s ability. In F. M. LordM. R. NovickEds., Statistical theories of mental test scores (pp. 395–479). Oxford, UK: Addison-Wesley. First citation in article Google Scholar
Bock, R. D., Gibbons, R. D. & Muraki, E. (1988). Full-Information Item Factor Analysis. Applied Psychological Measurement, 12, 261–280. https://doi.org/10.1177/014662168801200305 First citation in article Crossref, Google Scholar
Bock, R., Gibbons, R., Schilling, S., Muraki, E., Wilson, D. & Wood, R. (2003). Testfact (Version 4.0) [Computer software and manual]. Lincolnwood, IL: Scientific Software International. First citation in article Google Scholar
Bradlow, E. T., Wainer, H. & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168. https://doi.org/10.1007/BF02294533 First citation in article Crossref, Google Scholar
Brown, A. & Croudace, T. J. (2015). Scoring and estimating score precision using multidimensional IRT models. In S. P. ReiseD. A. RevickiEds., Handbook of Item Response Theory Modeling: Applications to typical performance assessment (pp. 307–333). New York, NY: Routledge. First citation in article Google Scholar
Cai, L. (2010). A two-tier full-information item factor analysis model with applications. Psychometrika, 75, 581–612. https://doi.org/10.1007/s11336-010-9178-0 First citation in article Crossref, Google Scholar
Cai, L., Yang, J. S. & Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychological Methods, 16, 221–248. https://doi.org/10.1037/a0023350 First citation in article Crossref, Google Scholar
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48, 1–29. https://doi.org/10.18637/jss.v048.i06 First citation in article Crossref, Google Scholar
Chen, F. F., Hayes, A., Carver, C. S., Laurenceau, J.-P. & Zhang, Z. (2012). Modeling general and specific variance in multifaceted constructs: A comparison of the Bifactor Model to other approaches: Bifactor modeling of multifaceted constructs. Journal of Personality, 80, 219–251. https://doi.org/10.1111/j.1467-6494.2011.00739.x First citation in article Crossref, Google Scholar
DeMars, C. E. (2006). Application of the Bi-Factor Multidimensional Item Response Theory Model to testlet-based tests. Journal of Educational Measurement, 43, 145–168. https://doi.org/10.1111/j.1745-3984.2006.00010.x First citation in article Crossref, Google Scholar
DeMars, C. E. (2012). Confirming testlet effects. Applied Psychological Measurement, 36, 104–121. https://doi.org/10.1177/0146621612437403 First citation in article Crossref, Google Scholar
DeMars, C. E. (2013). A tutorial on interpreting Bifactor Model scores. International Journal of Testing, 13, 354–378. https://doi.org/10.1080/15305058.2013.799067 First citation in article Crossref, Google Scholar
Gibbons, R. D. & Hedeker, D. R. (1992). Full-information item bi-factor analysis. Psychometrika, 57, 423–436. https://doi.org/10.1007/BF02295430 First citation in article Crossref, Google Scholar
Jiao, H., Wang, S. & He, W. (2013). Estimation methods for one-parameter testlet models. Journal of Educational Measurement, 50, 186–203. https://doi.org/10.1111/jedm.12010 First citation in article Crossref, Google Scholar
Li, Y., Bolt, D. M. & Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological Measurement, 30, 3–21. https://doi.org/10.1177/0146621605275414 First citation in article Crossref, Google Scholar
Li, Y. & Rupp, A. A. (2011). Performance of the S – χ² statistic for full-information bifactor models. Educational and Psychological Measurement, 71, 986–1005. https://doi.org/10.1177/0013164410392031 First citation in article Crossref, Google Scholar
Morgan, G. B., Hodge, K. J., Wells, K. E. & Watkins, M. W. (2015). Are fit indices biased in favor of bi-factor models in cognitive ability research? A comparison of fit in correlated factors, higher-order, and bi-factor models via Monte Carlo simulations. Journal of Intelligence, 3, 2–20. https://doi.org/10.3390/jintelligence3010002 First citation in article Crossref, Google Scholar
Muraki, E. (1992). A Generalized Partial Credit Model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176. https://doi.org/10.1177/014662169201600206 First citation in article Crossref, Google Scholar
Reise, S. P., Cook, K. F. & Moore, T. M. (2014). Evaluating the impact of multidimensionality on unidimensional item response theory model parameters. In S. P. ReiseD. A. RevickiEds., Handbook of Item Response Theory Modeling: Applications to typical performance assessment (pp. 13–40). New York, NY: Routledge. First citation in article Google Scholar
R Core Team. (2014). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. First citation in article Google Scholar
Rijmen, F. (2009). Efficient full information maximum likelihood estimation for multidimensional IRT models. ETS Research Report Series, 2009, i–31. https://doi.org/10.1002/j.2333–8504.2009.tb02160.x First citation in article Google Scholar
Rijmen, F. (2010). Formal relations and an empirical comparison among the Bi-Factor, the Testlet, and a Second-Order Multidimensional IRT Model. Journal of Educational Measurement, 47, 361–372. https://doi.org/10.1111/j.1745-3984.2010.00118.x First citation in article Crossref, Google Scholar
Rijmen, F., Jeon, M., von Davier, M. & Rabe-Hesketh, S. (2014). A third-order item response theory model for modeling the effects of domains and subdomains in large-scale educational assessment surveys. Journal of Educational and Behavioral Statistics, 39, 235–256. https://doi.org/10.3102/1076998614531045 First citation in article Crossref, Google Scholar
Sireci, S. G., Thissen, D. & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237–247. https://doi.org/10.1111/j.1745-3984.1991.tb00356.x First citation in article Crossref, Google Scholar
Thissen, D. (2013). Using the Testlet Response Model as a shortcut to Multidimensional Item Response Theory subscore computation. In R. E. MillsapL. A. van der ArkD. M. BoltC. M. WoodsEds., New Developments in Quantitative Psychology (pp. 29–40). New York, NY: Springer. First citation in article Google Scholar
Thissen, D., Steinberg, L. & Mooney, J. A. (1989). Trace lines for testlets: A use of Multiple-Categorical-Response Models. Journal of Educational Measurement, 26, 247–260. https://doi.org/10.1111/j.1745-3984.1989.tb00331.x First citation in article Crossref, Google Scholar
Wainer, H., Bradlow, E. T. & Du, Z. (2000). Testlet Response Theory: An analog for the 3PL Model useful in testlet-based adaptive testing. In W. J. van der LindenG. A. W. GlasEds., Computerized Adaptive Testing: Theory and Practice (pp. 245–269). Amsterdam, The Netherlands: Springer. First citation in article Google Scholar
Wainer, H. & Kiely, G. L. (1987). Item Clusters and Computerized Adaptive Testing: A case for testlets. Journal of Educational Measurement, 24, 185–201. https://doi.org/10.1111/j.1745-3984.1987.tb00274.x First citation in article Crossref, Google Scholar
Wainer, H. & Wang, X. (2000). Using a new statistical model for Testlets to score TOEFL. Journal of Educational Measurement, 37, 203–220. https://doi.org/10.1111/j.1745-3984.2000.tb01083.x First citation in article Crossref, Google Scholar
Wirth, R. J. & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12, 58–79. https://doi.org/10.1037/1082-989X.12.1.58 First citation in article Crossref, Google Scholar
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145. https://doi.org/10.1177/014662168400800201 First citation in article Crossref, Google Scholar
Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187–213. https://doi.org/10.1111/j.1745-3984.1993.tb00423.x First citation in article Crossref, Google Scholar

Volume 13Issue 4October 2017

ISSN: 1614-1881eISSN: 1614-2241

History

ReceivedOctober 30, 2015
RevisedFebruary 16, 2017
AcceptedApril 27, 2017
Published onlineNovember 24, 2017

Licenses & Copyright

Keywords

Acknowledgments:

This research was supported by Grant PSI2013-44300-P (Ministerio de Economia y Competitividad and European Social Fund).

PDF download

Verify Phone

Congrats!

Comparison of Uni- and Multidimensional Models Applied in Testlet-Based Tests

Abstract

References

History

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

Comparison of Uni- and Multidimensional Models Applied in Testlet-Based Tests

Abstract

References

History

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners