On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling

Vera, J. Fernando; Macías, Rodrigo

doi:10.1007/s11336-021-09757-2

On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling

Theory and Methods
Published: 19 May 2021

Volume 86, pages 489–513, (2021)
Cite this article

Psychometrika Aims and scope Submit manuscript

680 Accesses
10 Citations
Explore all metrics

Abstract

In this article, we analyse the usefulness of multidimensional scaling in relation to performing K-means clustering on a dissimilarity matrix, when the dimensionality of the objects is unknown. In this situation, traditional algorithms cannot be used, and so K-means clustering procedures are being performed directly on the basis of the observed dissimilarity matrix. Furthermore, the application of criteria originally formulated for two-mode data sets to determine the number of clusters depends on their possible reformulation in a one-mode situation. The linear invariance property in K-means clustering for squared dissimilarities, together with the use of multidimensional scaling, is investigated to determine the cluster membership of the observations and to address the problem of selecting the number of clusters in K-means for a dissimilarity matrix. In particular, we analyse the performance of K-means clustering on the full dimensional scaling configuration and on the equivalently partitioned configuration related to a suitable translation of the squared dissimilarities. A Monte Carlo experiment is conducted in which the methodology examined is compared with the results obtained by procedures directly applicable to a dissimilarity matrix.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

References

Bailey, R. A., & Gower, J. C. (1990). Approximating a symmetric matrix. Psychometrika, 55, 665–675.
Article Google Scholar
Borg, I. & Groenen, P. J. F. (2005). Modern multidimensional scaling. Theory and applications, Springer series in statistics, 2nd Ed. Springer.
Brusco, M. J., & Steinley, D. (2007). A comparison of heuristic procedures for minimum within-cluster sums of squares partitioning. Psychometrika, 72, 583–600.
Article Google Scholar
Cailliez, F. (1983). The analytical solution of the additive constant problem. Psychometrika, 48(2), 305–308.
Article Google Scholar
Calinski, R. B., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics, 3, 1–27.
Google Scholar
Chae, S. S., Dubien, J. L., & Warde, W. D. (2006). A method of predicting the number of clusters using rand’s statistic. Computational Statistics and Data Analysis, 50(12), 3531–3546.
Article Google Scholar
Chiang, M. M., & Mirkin, B. (2010). Intelligent choice of the number of cluster in K-means clustering: an experimental study with different cluster spreads. Journal of Classification, 27, 3–40.
Article Google Scholar
Cilibrasi, R. & Vitanyi, P. (2004). Automatic meaning discovery using Google. Technical Report (pp. 1–31). University of Amsterdam, National ICT of Australia.
De Leeuw, J., & Groenen, P. J. F. (1997). Inverse multidimensional scaling. Journal of Classification, 14, 3–21.
Article Google Scholar
De Leeuw J. & Heiser W. J. (1980). Multidimensional scaling with restrictions on the configuration. In P.R. Krishnaiah (Ed.), Multivariate analysis (Vol. V, pp. 501–522). North-Holland.
DeSarbo, W., Carroll, J. D., Clark, L., & Green, P. (1984). Synthesized clustering: A method for amalgamating alternative clustering bases with differential weighting of variables. Psychometrika, 49, 57–78.
Article Google Scholar
Duin R. P. (2012). PRTools. http://www.prtools.org.
Everitt, B. S., Landau, S., Leese, M. & Stahl, D. (2011). Cluster analysis. Wiley series in probability and statistics (5th ed.). Wiley.
Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., & Zhang, J. (2008). Graph distances in the streaming model. SIAM Journal on Computing, 38(5), 1709–1727.
Article Google Scholar
Hartigan, J. A. (1975). Clustering algorithms. Wiley
Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A \(K\)-means clustering algorithm. Applied Statistics, 28, 100–108.
Article Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer.
Heiser, W. J., & Groenen, P. J. F. (1997). Cluster differences scaling with a within-clusters loss component and a fuzzy succesive approximation strategy to avoid local minima. Psychometrika, 62(1), 63–83.
Article Google Scholar
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
Article Google Scholar
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666.
Article Google Scholar
Kak, S. (2002). A class of instantaneously trained neural networks. Information Sciences, 148, 97–102.
Article Google Scholar
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. Wiley.
Krzanowski, W. J., & Lai, Y. T. (1985). A criterion for determining the number of groups in a data set using sum of squares clustering. Biometrics, 44, 23–34.
Article Google Scholar
Lichtenauer, J. F., Hendriks, E. A., & Reinders, M. J. T. (2008). Sign language recognition by combining statistical DTW and independent classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 2040–2046.
Article PubMed Google Scholar
Lingoes, J. C. (1971). Some boundary conditions for a monotone analysis of symmetric matrices. Psychometrika, 36, 195–203.
Article Google Scholar
Lloyd, S. P. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(1982), 129–137.
Article Google Scholar
Mardia, K. V. (1978). Some properties of clasical multi-dimesional scaling. Communications in Statistics-Theory and Methods, 7(13), 1233–1241.
Article Google Scholar
Makarenkov, V., & Legendre, P. (2001). Optimal variable weighting for ultrametric and additive trees and K-means partitioning: Methods and software. Journal of Classification, 18, 245–271.
Article Google Scholar
McQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In 5th Berkeley symposium on mathematical statistics and probability (Vol. II, pp. 281–297).
Melnykov, V., Chen, W.-C., & Maitra, R. (2012). MixSim: An R package for simulating data to study performance of clustering algorithms. Journal of Statistical Software, 51(12), 1–25.
Article Google Scholar
Milligan, G. W., & Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50, 159–179.
Article Google Scholar
Pekalska, E., Paclik, P., & Duin, R. P. (2001). A generalized kernel approach to dissimilarity-based classification. Journal of Machine Learning Research, 2(Dec), 175–211.
Google Scholar
Ramsay, J. O. (1982). Some statistical approaches to multidimensional scaling data. Journal of the Royal Statistical Society, A, 145, 285–312.
Article Google Scholar
Schleif, F. M. (2015). Generic probabilistic prototype based classification of vectorial and proximity data. Neurocomputing, 154, 208–216.
Article Google Scholar
Schleif, F. M., Chen, H. & Tino, P. (2015). Incremental probabilistic classification vector machine with linear costs. In Proceedings of IJCNN (Vol. 2015).
Schwarz, A. J. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.
Article Google Scholar
Sebastian, T. B., Klein, P. N., & Kimia, B. B. (2001). Alignment-based recognition of shape outlines. In International workshop on visual form (pp. 606–618). Springer.
Steinley, D. (2006). \(K\)-means clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology, 59, 1–34.
Article PubMed Google Scholar
Steinley, D. (2008). Stability analysis in \(K\)-means clustering. British Journal of Mathematical and Statistical Psychology, 61, 255–273.
Article PubMed Google Scholar
Steinley, D., & Brusco, M. J. (2007). Initializing \(K\)-means batch clustering: A critical evaluation of several techniques. Journal of Classification, 24, 99–121.
Article Google Scholar
Steinley, D., & Brusco, M. J. (2011). Choosing the number of clusters in K-means clustering. Psychological Methods, 16(3), 285–297.
Article PubMed Google Scholar
Steinley, D., & Hubert, L. (2008). Order constrained solutions in K-means clustering: Even better than being globally optimal. Psychometrika, 73(4), 647–664.
Article Google Scholar
Sugar, C. A., & James, G. M. (2003). Finding the number of clusters in a dataset: An information-theoretic approach. Journal of the American Statistical Asssociation, 98, 750–762.
Article Google Scholar
Takane, Y., Young, F., & de Leeuw, J. (1976). Non-metric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features. Psychometrika, 42, 7–67.
Article Google Scholar
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society B, 63, 411–423.
Article Google Scholar
Vera, J. F. (2017). Distance stability analysis in multidimensional scaling using the jackknife method. British Journal of Mathematical and Statistical Psychology, 70, 25–41.
Article PubMed Google Scholar
Vera, J. F., & Macías, R. (2017). Variance-based cluster selection criteria in a \(K\)-means framework for one-mode dissimilarity data. Psychometrika, 82(2), 275–294.
Article PubMed Google Scholar
Vera, J. F., Macías, R., & Angulo, J. M. (2008). Non-stationary spatial covariance structure estimation in oversampled domains by cluster differences scaling with spatial constraints. Stochastic Environmental Research and Risk Assessment, 22, 95–106.
Article Google Scholar
Vera, J. F., Macías, R., & Angulo, J. M. (2009). A latent class MDS model with spatial constraints for non-stationary spatial covariance estimation. Stochastic Environmental Research and Risk Assessment, 23(6), 769–779.
Article Google Scholar
Vera, J. F., Macías, R., & Heiser, W. J. (2009a). A latent class multidimensional scaling model for two-way one-mode continuous rating dissimilarity data. Psychometrika, 74(2), 297–315.
Article Google Scholar
Vera, J. F., Macías, R., & Heiser, W. J. (2009b). A dual latent class unfolding model for two-way two-mode preference rating data. Computational Statistics and Data Analysis, 53(8), 3231–3244.
Article Google Scholar
Vera, J. F., Macías, R., & Heiser, W. J. (2013). Cluster differences unfolding for two-way two-mode preference rating data. Journal of Classification, 30, 370–396.
Article Google Scholar
Witten, D. M., & Tibshirani, R. (2010). A framework for feature selection in clustering. Journal of the American Statistical Association, 105(490), 713–726.
Article PubMed PubMed Central Google Scholar
Zhang, Y., Mandziuk, J., Quek, C. H., & Goh, B. W. (2017). Curvature-based method for determining the number of clusters. Information Sciences, 415, 414–428.
Article Google Scholar

Download references

Acknowledgements

This work has been partially supported by Grants ECO2013-48413-R of the Spanish Ministry of Economy and Competitiveness, co-financed by FEDER, and RTI2018-099723-B-I00, Ministry of Science and Innovation—State Research Agency of Spain, co-financed by FEDER (J. Fernando Vera) and CB-252996, CONACYT, México (Rodrigo Macías).

Author information

Authors and Affiliations

Unidad Monterey, University of Granada, Granada, Spain
J. Fernando Vera
Centro De Investigación En Matemáticas, Unidad Monterrey, Mexico
Rodrigo Macías

Authors

J. Fernando Vera
View author publications
You can also search for this author in PubMed Google Scholar
Rodrigo Macías
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. Fernando Vera.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 53 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vera, J.F., Macías, R. On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling. Psychometrika 86, 489–513 (2021). https://doi.org/10.1007/s11336-021-09757-2

Download citation

Received: 20 July 2019
Revised: 25 September 2020
Accepted: 26 February 2021
Published: 19 May 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11336-021-09757-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 53 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 53 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation