Skip to main content
Log in

Using EM Algorithm for Finite Mixtures and Reformed Supplemented EM for MIRT Calibration

  • Theory and Methods
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

This study revisits the parameter estimation issues in multidimensional item response theory more thoroughly and investigates some computation details that have seldom been addressed previously when implementing the expectation-maximization (EM) algorithm for finite mixtures (EM–FM). Two research questions are: Should we rescale after each EM cycle or after the final EM cycle? How to adapt the supplemented EM algorithm to the EM–FM framework to estimate standard errors (SEs) of all unknown parameters? Analytic details of the methods are provided, and a comprehensive simulation study is conducted to provide supporting evidence. Results reveal that rescaling after each EM cycle accelerates convergence without affecting the calibration accuracy. Moreover, the SEs of all model parameters, including item parameters and population mixing proportions, recover well when the sample size is relatively large (e.g., 2000).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. In Bartolucci et al.’s (2014) study, the Fisher–Scoring (F–S) algorithm is used to update the parameters other than latent weights (including the item parameters and support points). The F–S algorithm alternates a step in which the discrimination parameters are updated with a step in which the difficulty parameters and support points are updated.

  2. Note that \(F_{i}\left( {\varvec{\upgamma }}_{(j)}^{(s)} \right) \) is obtained by running only one iteration of the original EM–FM codes.

  3. This method was realized via an R package, namely, fungible (Waller and Jones 2016).

  4. The reformed USEM refers to the USEM approach equipped with the proposed heuristic solution and element-wise convergence criterion.

  5. In the pilot study, three levels of the number of nodes per dimension (i.e., 5, 8 and 10) were considered to investigate the impact of the number of nodes per dimension on the estimation accuracies of item parameters and their SEs. Results indicated that 8 nodes per dimension is appropriate on the whole. This issue will be further discussed in the final section.

References

  • Ackerman, T. (1996). Graphical representation of multidimensional item response theory analyses. Applied Psychological Measurement, 20, 311–329.

    Article  Google Scholar 

  • Baker, F. B., & Kim, S. H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York: Dekker.

    Book  Google Scholar 

  • Bartolucci, F. (2007). A class of multidimensional IRT models for testing unidimensionality and clustering items. Psychometrika, 72, 141–157.

    Article  Google Scholar 

  • Bartolucci, F., Bacci, S., & Gnaldi, M. (2014). MultiLCIRT: An R package for multidimensional latent class item response models. Computational Statistics and Data Analysis, 71, 971–985.

    Article  Google Scholar 

  • Bartolucci, F., Bacci, S., & Gnaldi, M. (2015). Statistical analysis of questionnaires: A unified approach based on Stata and R. Boca Raton: Chapman and Hall/CRC Press.

    Book  Google Scholar 

  • Birnbaum, A. (1968). Some latent trait models. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459.

    Article  Google Scholar 

  • Bolt, D. M., & Lall, V. F. (2003). Estimation of compensatory and non-compensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement, 27, 395–414.

    Article  Google Scholar 

  • Bolt, D. (2005). Limited and full information estimation of item response theory models. In A. Maydeu-Olivares & J. J. McArdle (Eds.), Contemporary Psychometrics: A festschrift for Roderick P. McDonald (pp. 27–71). Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Bono, R., Blanca, M. J., Arnau, J., & Gómez-Benito, J. (2017). Non-normal distributions commonly used in health, education, and social sciences: A systematic review. Frontiers in Psychology, 8, 1602.

    Article  PubMed  PubMed Central  Google Scholar 

  • Cai, L. (2008). SEM of another flavour: Two new applications of the supplemented EM algorithm. British Journal of Mathematical and Statistical Psychology, 61, 309–329.

    Article  Google Scholar 

  • Cai, L. (2010). Metropolis-Hastings Robbins–Monro algorithm for confirmatory item factor analysis. Journal of Educational and Behavioral Statistics, 35, 307–335.

    Article  Google Scholar 

  • Cai, L., & Hansen, H. (2013). Limited-information goodness-of-fit testing of hierarchical item factor models. British Journal of Mathematical and Statistical Psychology, 66, 245–276.

    Article  Google Scholar 

  • Cai, L., & Thissen, D. (2015). Modern approaches to parameter estimation in item response theory. In S. P. Reise & D. A. Revicki (Eds.), Handbook of item response theory modeling: Applications to typical performance assessment (pp. 41–59). New York, NY: Routledge.

    Google Scholar 

  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48, 1–29.

    Article  Google Scholar 

  • Chang, H. H., Qian, J. H., & Ying, Z. L. (2001). \(a\)-stratified multistage computerized adaptive testing with b blocking. Applied Psychological Measurement, 25, 333–341.

    Article  Google Scholar 

  • Chen, P. (2017). A comparative study of online item calibration methods in multidimensional computerized adaptive testing. Journal of Educational and Behavioral Statistics, 42, 559–590.

    Article  Google Scholar 

  • Chen, P., & Wang, C. (2016). A new online calibration method for multidimensional computerized adaptive testing. Psychometrika, 81, 674–701.

    Article  PubMed  Google Scholar 

  • Chen, P., Wang, C., Xin, T., & Chang, H.-H. (2017). Developing new online calibration methods for multidimensional computerized adaptive testing. British Journal of Mathematical and Statistical Psychology, 70, 81–117.

    Article  Google Scholar 

  • Chen, Y., Li, X., & Zhang, S. (2019). Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika, 84, 124–146.

    Article  PubMed  Google Scholar 

  • Curran, P. J., West, S. G., & Finch, G. F. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1, 16–29.

    Article  Google Scholar 

  • de la Torre, J. (2009). DINA model and parameter estimation: A didactic. Journal of Educational and Behavioral Statistics, 34, 115–130.

    Article  Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society B, 39, 1–38.

    Google Scholar 

  • Edwards, M. C. (2010). A Markov chain Monte Carlo approach to confirmatory item factor analysis. Psychometrika, 75, 474–497.

    Article  Google Scholar 

  • Haberman, S. J., von Davier, M., & Lee, Y.-H. (2008). Comparison of multidimensional item response models: Multivariate normal ability distributions versus multivariate polytomous ability distributions (ETS Research Report RR-08-45). Princeton, NJ: ETS.

    Google Scholar 

  • Heinen, T. (1996). Latent class and discrete latent trait models: Similarities and differences. Thousand Oaks, CA: Sage Publications.

    Google Scholar 

  • Jamshidian, M., & Jennrich, R. I. (2000). Standard errors for EM estimation. Journal of the Royal Statistical Society: Series B, 62, 257–270.

    Article  Google Scholar 

  • Kim, S. (2006). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43, 355–381.

    Article  Google Scholar 

  • Kim, S., & Kolen, M. J. (2016). Multiple group IRT fixed-parameter estimation for maintaining an established ability scale (CASMA Research Report Number 49). Iowa City, IA: University of Iowa.

    Google Scholar 

  • Lewis, C. (1985). Discussion. In D. J. Weiss (Ed.), Proceedings of the 1982 item response theory and computerized adaptive testing conference (pp. 203–209). Minneapolis: University of Minnesota, Department of Psychology, Computerized Adaptive Testing Laboratory.

  • Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Menlo Park: Addison-Wesley.

    Google Scholar 

  • Meng, X.-L., & Rubin, D. B. (1991). Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. Journal of American Statistical Association, 86, 899–909.

    Article  Google Scholar 

  • Meng, X.-L., & Schilling, S. G. (1996). Fitting full-information factor models and an empirical investigation of bridge sampling. Journal of the American Statistical Association, 91, 1254–1267.

    Article  Google Scholar 

  • Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49, 359–381.

    Article  Google Scholar 

  • Orchard, T., & Woodbury, M. A. (1972). A missing information principle: Theory and application. In L. M. LeCam, J. Neyman, & E. L. Scott (Eds.), Proceedings of the sixth Berkeley symposium on mathematical statistics and probability (pp. 697–715). Berkeley, CA: University of California Press.

    Google Scholar 

  • Paek, I., & Cai, L. (2014). A comparison of item parameter standard error estimation procedures for unidimensional and multidimensional item response theory modeling. Educational and Psychological Measurement, 74, 58–76.

    Article  Google Scholar 

  • Reckase, M. D. (2009). Multidimensional item response theory. New York, NY: Springer.

    Book  Google Scholar 

  • Schilling, S., & Bock, R. D. (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70, 533–555.

    Google Scholar 

  • Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331–354.

    Article  Google Scholar 

  • Tian, W., Cai, L., Thissen, D., & Xin, T. (2013). Numerical differentiation methods for computing error covariance matrices in item response theory modeling: An evaluation and a new proposal. Educational and Psychological Measurement, 73, 412–439.

    Article  Google Scholar 

  • Titterington, D. M., Smith, A. F. M., & Makov, U. E. (1985). Statistical analysis of finite mixture distributions. New York: Wiley.

    Google Scholar 

  • Vale, D. C., & Maurelli, V. A. (1983). Simulating multivariate nonnormal distributions. Psychometrika, 48, 465–471.

  • von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61, 287–307.

    Article  Google Scholar 

  • Waller, N., & Jones, J. (2016). Fungible: Fungible coefficients and Monte Carlo functions. R package: https://cran.r-project.org/web/packages/fungible/fungible.pdf.

  • Wang, C. (2015). On latent trait estimation in multidimensional compensatory item response models. Psychometrika, 80, 428–449.

    Article  PubMed  Google Scholar 

  • Wang, C., & Chang, H. (2011). Item selection in multidimensional computerized adaptive testing-Gaining information from different angles. Psychometrika, 76, 363–384.

    Article  Google Scholar 

  • Wang, C., Su, S. Y., & Weiss, D. J. (2018). Robustness of parameter estimation to assumptions of normality in the multidimensional graded response model. Multivariate Behavioral Research, 53, 403–418.

    Article  PubMed  Google Scholar 

  • Woodruff, D. J., & Hanson, B. A. (1996). Estimation of item response models using the EM algorithm for finite mixtures (ACT Research Report 96–6). Iowa City, IA: ACT Inc.

    Google Scholar 

  • Woods, C. M. (2007). Empirical histograms in item response theory with ordinal data. Educational and Psychological Measurement, 67, 73–87.

    Article  Google Scholar 

  • Woods, C. M. (2015). Estimating the latent density in unidimensional IRT to permit non-normality. In S. P. Reise & D. A. Revicki (Eds.), Handbook of item response theory modeling: Applications to typical performance assessment (pp. 60–84). New York, NY: Routledge.

    Google Scholar 

  • Yao, L. H. (2012). Multidimensional CAT item selection methods for domain scores and composite scores: Theory and applications. Psychometrika, 77, 495–523.

    Article  PubMed  Google Scholar 

  • Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145.

    Article  Google Scholar 

  • Zhang, H., Chen, Y., & Li, X. (2019). A note on exploratory item factor analysis by singular value decomposition. arXiv:1907.08713.

Download references

Acknowledgements

This study was partially supported by the National Natural Science Foundation of China (Grant No. 32071092) and the Research Program Funds of the Collaborative Innovation Center of Assessment toward Basic Education Quality (Grant Nos. 2019-01-082-BZK01 and 2019-01-082-BZK02). The authors are indebted to the editor, associate editor and three anonymous reviewers for their constructive suggestions and comments on the earlier manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ping Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, P., Wang, C. Using EM Algorithm for Finite Mixtures and Reformed Supplemented EM for MIRT Calibration. Psychometrika 86, 299–326 (2021). https://doi.org/10.1007/s11336-021-09745-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-021-09745-6

Keywords

Navigation