Skip to main content
Log in

Latent class analysis variable selection

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

We propose a method for selecting variables in latent class analysis, which is the most common model-based clustering method for discrete data. The method assesses a variable’s usefulness for clustering by comparing two models, given the clustering variables already selected. In one model the variable contributes information about cluster allocation beyond that contained in the already selected variables, and in the other model it does not. A headlong search algorithm is used to explore the model space and select clustering variables. In simulated datasets we found that the method selected the correct clustering variables, and also led to improvements in classification performance and in accuracy of the choice of the number of classes. In two real datasets, our method discovered the same group structure with fewer variables. In a dataset from the International HapMap Project consisting of 639 single nucleotide polymorphisms (SNPs) from 210 members of different groups, our method discovered the same group structure with a much smaller number of SNPs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Badsberg, J. H. (1992). Model search in contingency tables by CoCo. In Y. Dodge, J. Whittaker (Eds.), Computational statistics (Vol. 1, pp. 251–256). Heidelberg: Physica Verlag.

  • Clogg C.C. (1981) New developments in latent structure analysis. In: Jackson D.J., Borgatta E.F. (eds) Factor analysis and measurement in sociological research. Sage, Beverly Hills, pp 215–246

    Google Scholar 

  • Clogg C.C. (1995) Latent class models. In: Arminger G., Clogg C.C., Sobel M.E. (eds) Handbook of statistical modeling for the social and behavioral sciences. Plenum, New York, pp 311–360

    Google Scholar 

  • Detrano R., Janosi A., Steinbrunn W., Pfisterer M., Schmid J.-J., Sandhu S., Guppy K. H., Lee S., Froelicher V. (1989) International application of a new probability algorithm for the diagnosis of coronary artery disease. American Journal of Cardiology 64: 304–310

    Article  Google Scholar 

  • Fraley C., Raftery A.E. (2002) Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97: 611–631

    Article  MATH  MathSciNet  Google Scholar 

  • Galimberti G., Soffritti G. (2006) Identifying multiple cluster structures through latent class models. In: Spiliopoulou M., Kruse R., Borgelt C., Nürnberger A., Gaul W. (eds) From data and information analysis to knowledge engineering. Springer, Berlin, pp 174–181

    Chapter  Google Scholar 

  • Gennari J.H., Langley P., Fisher D. (1989) Models of incremental concept formation. Artificial Intelligence 40: 11–61

    Article  Google Scholar 

  • Goodman L.A. (1974) Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61: 215–231

    Article  MATH  MathSciNet  Google Scholar 

  • Hagenaars J.A., McCutcheon A.L. (2002) Applied latent class analysis. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Hubert L., Arabie P. (1985) Comparing partitions. Journal of Classification 2: 193–218

    Article  Google Scholar 

  • Kass R.E., Raftery A.E. (1995) Bayes factors. Journal of the American Statistical Association 90: 773–795

    Article  MATH  Google Scholar 

  • Keribin C. (1998) Consistent estimate of the order of mixture models. Comptes Rendues de l’Academie des Sciences, Série I-Mathématiques 326: 243–248

    Article  MATH  MathSciNet  Google Scholar 

  • Lazarsfeld, P. F. (1950a). The logical and mathematical foundations of latent structure analysis. In S. A. Stouffer (Ed.), Measurement and prediction, the American soldier: studies in social psychology in World War II (Vol. IV, Chap. 10, pp. 362–412). Princeton, NJ: Princeton University Press.

  • Lazarsfeld, P. F. (1950b). The interpretation and computation of some latent structures. In S. A. Stouffer (Ed.), Measurement and prediction, the American soldier: studies in social psychology in World War II (Vol. IV, Chap. 11, pp. 413–472). Princeton, NJ: Princeton University Press.

  • Lazarsfeld P.F., Henry N.W. (1968) Latent structure analysis. Houghton Mifflin, Boston

    MATH  Google Scholar 

  • McCutcheon A.L. (1987) Latent class analysis. Sage, Newbury Park, CA

    Google Scholar 

  • McLachlan G.J., Peel D. (2000) Finite mixture models. Wiley, New York

    Book  MATH  Google Scholar 

  • Raftery A.E., Dean N. (2006) Variable selection for model-based clustering. Journal of the American Statistical Association 101: 168–178

    Article  MATH  MathSciNet  Google Scholar 

  • Rand W.M. (1971) Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66: 846–850

    Article  Google Scholar 

  • Rusakov D., Geiger D. (2005) Asymptotic model selection for naive Bayesian networks. Journal of Machine Learning Research 6: 1–35

    MathSciNet  Google Scholar 

  • The International HapMap Consortium (2003) The international hapmap project. Nature 426: 789–796

    Article  Google Scholar 

  • Wolfe, J. H. (1963). Object cluster analysis of social areas. Master’s thesis, University of California, Berkeley.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrian E. Raftery.

About this article

Cite this article

Dean, N., Raftery, A.E. Latent class analysis variable selection. Ann Inst Stat Math 62, 11–35 (2010). https://doi.org/10.1007/s10463-009-0258-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-009-0258-9

Keywords

Navigation