Abstract
Clustering is an important statistical tool for the analysis of unsupervised data. Spectral clustering and stochastic block models, based on networks and graphs, are well established and widely used for community detection among many clustering algorithms. In this paper we review and discuss important statistical issues in spectral clustering and stochastic block models.
Similar content being viewed by others
References
Amini, A. A., Chen, A., Bickel, P. J., & Levina, E. (2013). Pseudo-likelihood methods for community detection in large sparse networks. The Annals of Statistics, 41, 2097–2122.
Amini, A. A., & Levina, E. (2018). On semidefinite relaxations for the block model. The Annals of Statistics, 46, 149–179.
Belkin, M., & Niyogi, P. (2002). Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in Neural Information Processing Systems, 15, 585–591.
Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15, 1373–1396.
Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7, 2399–2434.
Ben-David, S., von Luxburg, U., & Pal, D. (2006). A sober look on clustering stability. In G. Lugosi & H. Simon (Eds.) Proceedings of the 19th Annual Conference on Learning Theory (COLT), (pp. 5–19). Springer, Berlin.
Ben-Hur, A., Elisseeff, A., & Guyon, I. (2002). A stability based method for discovering structure in clustered data. Pacific Symposium on Biocomputing (pp. 6–17).
Bickel, P. J., & Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proceedings of the National Academy of Science, USA, 106, 21068–21073.
Bolla, M. (1991). Relations between spectral and classification properties of multigraphs. Technical Report No. DIMACS-91-27, Center for Discrete Mathematics and Theoretical Computer Science.
Bui, T. N., & Jones, C. (1992). Finding good approximate vertex and edge partitions is NP-hard. Information Processing Letters, 42, 153–159.
Camille, C., Melanie, B., Remy, B., Jean-Michel, L., & Laurent, R. (2020). Robust spectral clustering using LASSO regularization. arXiv:2004.03845
Chaudhuri, K., Chung, F., & Tsiatis, A. (2012). Spectral clustering of graphs with general degrees in the extended planted partition model. Journal of Machine Learning Research, 1–23.
Chung, F. (1997). Spectral graph theory (Vol. 92 of the CBMS Regional Conference Series in Mathematics). In Conference Board of the Mathematical Sciences, Washington.
Donath, W. E., & Hoffman, A. J. (1973). Lower bounds for the partitioning of graphs. IBM Journal of Research & Development, 17, 420–425.
Fang, Y. X., & Wang, J. H. (2012). Selection of the number of clusters via the bootstrap method. Computational Statistics and Data Analysis, 56, 468–477.
Fiedler, M. (1973). Algebraic connectivity of graphs. Czechoslovak Mathematical Journal, 23, 298–305.
Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97, 611–631.
Hagen, L., & Kahng, A. B. (1992). New spectral methods for radio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 11, 1074–1085.
Hastie, T., Tibshirani, R., & Friedman, J. (2008). The Elements of Statistical Learning (2nd ed.). New York: Springer.
Holland, P., Laskey, K. B., & Leinhardt, S. (1983). Stochastic blockmodels: Some first steps. Social Networks, 5, 109–137.
Joseph, A., & Yu, B. (2016). Impact of regularization on spectral clustering. The Annals of Statistics, 44, 1765–1791.
Koller, D., & Friedman, N. (2009). Probabilistic graphical models : Principles and techniques. London: The MIT Press.
Lange, T., Roth, V., Braun, M., & Buhmann, J. (2004). Stability-based validation of clustering solutions. Neural Computation, 16, 1299–1323.
Le, C. M., & Levina, E. (2015). Estimating the number of components in networks by spectral methods. arXiv:1507.00827
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444.
Mohar, B. (1997). Some applications of Laplace eigenvalues of graphs. In: G. Hahn & G. Sabidussi (Eds.), Graph Symmetry: Algebraic Methods and Applications (vol. NATO ASI Ser. C 497, pp. 225–275). Kluwer.
Newman, M. E., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69, 026113.
Ng, A., Jordan, M., & Weiss, Y. (2002). On spectral clustering: analysis and an algorithm. In Advances in Neural Information Processing Systems, (pp. 849–856). MIT Press.
Nie, F., Zeng, Z., Tsang, I. W., Xu, D., & Zhang, C. (2011). Spectral embedded clustering: A framework for in-sample and out-of-sample spectral clustering. IEEE Transactions on Neural Networks, 22, 1796–1808.
Qin, T., & Rohe, K. (2013). Regularized spectral clustering under the degree-corrected stochastic blockmodel. Advances in Neural Information Processing Systems, 3120–3128.
Rohe, K., Chatterjee, S., & Yu, B. (2011). Spectral clustering and the high-dimensional stochastic block model. The Annals of Statistics, 39, 1878–1915.
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 888–905.
Tepper, M., Muse, P., Almansa, A., & Mejail, M. (2011). Automatically finding clusters in normalized cuts. Pattern Recognition, 44, 1372–1386.
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a dataset via the gap statistic. Journal of the Royal Statistical Society Series B, 63, 411–423.
von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17, 395–416.
Wagner, D., & Wagner, F. (1993). Between min cut and graph bisection. In: Proceedings of the 18th International Symposium on Mathematical Foundations of Computer Science, (pp. 744–750). London: Springer.
Wang, J. H. (2010). Consistent selection of the number of clusters via cross validation. Biometrika, 97, 893–904.
Wang, Q., Qin, Z., Nie, F., & Li, X. (2019). Spectral embedded adaptive neighbors clustering. IEEE Transactions on Neural Networks and Learning Systems, 30, 1265–1271.
Zelnik-Manor, L., & Perona, P. (2005). Self-tuning spectral clustering, advances in neural information processing systems, 1601–1608. Cambridge: MIT Press.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No.2019R1A2C1007193).
Rights and permissions
About this article
Cite this article
Baek, M., Kim, C. A review on spectral clustering and stochastic block models. J. Korean Stat. Soc. 50, 818–831 (2021). https://doi.org/10.1007/s42952-021-00112-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42952-021-00112-w