Skip to main content
Log in

A Modified k-Means Clustering Procedure for Obtaining a Cardinality-Constrained Centroid Matrix

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

k-means clustering is a well-known procedure for classifying multivariate observations. The resulting centroid matrix of clusters by variables is noted for interpreting which variables characterize clusters. However, between-clusters differences are not always clearly captured in the centroid matrix. We address this problem by proposing a new procedure for obtaining a centroid matrix, so that it has a number of exactly zero elements. This allows easy interpretation of the matrix, as we may focus on only the nonzero centroids. The development of an iterative algorithm for the constrained minimization is described. A cardinality selection procedure for identifying the optimal cardinality is presented, as well as a modified version of the proposed procedure, in which some restrictions are imposed on the positions of nonzero elements. The behaviors of our proposed procedure were evaluated in simulation studies and are illustrated with three real data examples, which demonstrate that the performances of the procedure is promising.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Adachi, K. (2009). Joint Procrustes analysis for simultaneous nonsingular transformation of component score and loading matrices. Psychometrika, 74, 667–683.

    Article  MathSciNet  Google Scholar 

  • Adachi, K., & Trendafilov, N. T. (2015). Sparse principal component analysis subject to prespecified cardinality of loadings. Computational Statistics, 31, 1–25.

    MathSciNet  MATH  Google Scholar 

  • Adachi, K. (2006). Multivariate data analysis. Tokyo: Nakanishiya Shuppan. in Japanese.

    Google Scholar 

  • Adachi, K., & Trendafilov, N. T. (2017). Sparsest factor analysis for clustering variables: a matrix decomposition approach. Advances in Data Analysis and Classification, 25, 1–29.

    MATH  Google Scholar 

  • Aggarwal, C.C., & Reddy, C.K. (2013). Data clustering: algorithms and applications. Boca Raton: CRC Press.

    Book  Google Scholar 

  • Alsius, A., Wayne, R. V., Paré, M., & Munhall, K. G. (2016). High visual resolution matters in audiovisual speech perception, but only for some, Attention. Perception, & Psychophysics, 78, 1472–1487.

    Article  Google Scholar 

  • Basu, S., Davidson, I., & Wagstaff, K. (Eds.). (2008). Constrained clustering: advances in algorithms, theory, and applications. Boca Raton: CRC Press.

  • Bock, H. H. (1996). Probabilistic models in cluster analysis. Computational Statistics & Data Analysis, 23, 5–28.

    Article  Google Scholar 

  • Browne, M. W. (2001). An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research, 36, 111–150.

    Article  Google Scholar 

  • Brusco, M. J., & Cradit, J. D. (2001). A variable-selection heuristic for K-means clustering. Psychometrika, 66, 249–270.

    Article  MathSciNet  Google Scholar 

  • Cortina, L. M., & Wasti, S. A. (2005). Profiles in coping: responses to sexual harassment across persons, organizations, and cultures. Journal of Applied Psychology, 90, 182–192.

    Article  Google Scholar 

  • Dalton, C., Jennings, E., O’dwyer, B., & Taylor, D. (2016). Integrating observed, inferred and simulated data to illuminate environmental change: a limnological case study. Biology and Environment: Proceedings of the Royal Irish Academy, 116, 279–294.

    Google Scholar 

  • DeSarbo, W. S., & Mahajan, V. (1984). Constrained classification: the use of a priori information in cluster analysis. Psychometrika, 49, 187–215.

    Article  Google Scholar 

  • Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Human Genetics, 7, 179–188.

    Google Scholar 

  • Fowlkes, E. B., & Mallows, C. L. (1983). A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78, 553–569.

    Article  Google Scholar 

  • Gordon, A.D. (1973). 359. Note: Classification in the presence of constraints. Biometrics, 29, 821–827.

    Article  Google Scholar 

  • Harman, H. H. (1976). Modern factor analysis, 3rd edn. Chicago: University of Chicago Press.

    MATH  Google Scholar 

  • Hendrickson, A. E., & White, P. O. (1964). PROMAX: a quick method for rotation to oblique simple structure. British Journal of Mathematical and Statistical Psychology, 17, 65–70.

    Article  Google Scholar 

  • Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.

    Article  Google Scholar 

  • Hyland, J. J., Jones, D. L., Parkhill, K. A., Barnes, A. P., & Williams, A. P. (2016). Farmers’ perceptions of climate change: identifying types. Agriculture and Human Values, 33, 323–339.

    Article  Google Scholar 

  • Jetti, S. K., Vendrell-Llopis, N., & Yaksi, E. (2014). Spontaneous activity governs olfactory representations in spatially organized habenular microcircuits. Current Biology, 24, 434–439.

    Article  Google Scholar 

  • Kaiser, H. F. (1974). An index of factorial simplicity. Psychometrika, 39, 31–36.

    Article  Google Scholar 

  • Kuerbis, A., Armeli, S., Muench, F., & Morgenstern, J. (2014). Profiles of confidence and commitment to change as predictors of moderated drinking: a person-centered approach. Psychology of Addictive Behaviors, 28, 1065–1076.

    Article  Google Scholar 

  • MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, vol. 1, pp. 281–297.

  • Miyamoto, S., Ichihashi, H., & Honda, K. (2008). Algorithms for fuzzy clustering. Berlin: Springer.

    MATH  Google Scholar 

  • Peng, X., Zhou, C., & Hepburn, D. M. (2013). Application of K-Means method to pattern recognition in on-line cable partial discharge monitoring. IEEE Transactions on Dielectrics and Electrical Insulation, 20, 754–761.

    Article  Google Scholar 

  • Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846–850.

    Article  Google Scholar 

  • Satomura, H., & Adachi, K. (2013). Oblique rotation in canonical correlation analysis reformulated as maximizing the generalized coefficient of determination. Psychometrika, 78, 526–573.

    Article  MathSciNet  Google Scholar 

  • Schloss, K. B., Hawthorne-Madell, D., & Palmer, S. E. (2015). Ecological influences on individual differences in color preference, Attention. Perception, & Psychophysics, 77, 2803–2816.

    Article  Google Scholar 

  • Slobodenyuk, N., Jraissati, Y., Kanso, A., Ghanem, L., & Elhajj, I. (2015). Cross-modal associations between color and haptics, Attention. Perception, & Psychophysics, 77, 1379–1395.

    Article  Google Scholar 

  • Steinley, D. (2006). K-means clustering: a half-century synthesis. British Journal of Mathematical and Statistical Psychology, 59, 1–34.

    Article  MathSciNet  Google Scholar 

  • Steinley, D., & Brusco, M. J. (2008). Selection of variables in cluster analysis: an empirical comparison of eight procedures. Psychometrika, 73, 125–144.

    Article  MathSciNet  Google Scholar 

  • Steinley, D., Brusco, M. J., & Hubert, L. (2016). The variance of the adjusted Rand index. Psychological Methods, 21, 261–272.

    Article  Google Scholar 

  • Steinley, D., & Hubert, L. (2008). Order-constrained solutions in K-means clustering: even better than being globally optimal. Psychometrika, 73, 647–664.

    Article  MathSciNet  Google Scholar 

  • Thurstone, L.L. (1947). Multiple-factor analysis. Chicago: University of Chicago Press.

    MATH  Google Scholar 

  • Ullman, J. B. (2006). Structural equation modeling: reviewing the basics and moving forward. Journal of Personality Assessment, 87, 33–50.

    Article  Google Scholar 

  • Yamashita, N. (2012). Canonical correlation analysis formulated as maximizing sum of squared correlations and rotation of structure matrices. The Japanese Journal of Behaviormetrics, 39, 1–9. (in Japanese).

    Article  MathSciNet  Google Scholar 

  • Yamashita, N., & Mayekawa, S. (2015). A new biplot procedure with joint classification of objects and variables by fuzzy c-means clustering. Advances in Data Analysis and Classification, 9, 243—266.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors are deeply grateful for the helpful comments and suggestions of the associate editor.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naoto Yamashita.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yamashita, N., Adachi, K. A Modified k-Means Clustering Procedure for Obtaining a Cardinality-Constrained Centroid Matrix. J Classif 37, 509–525 (2020). https://doi.org/10.1007/s00357-019-09324-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-019-09324-6

Keywords

Navigation