Skip to main content
Log in

A review on spectral clustering and stochastic block models

  • Review Article
  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

Clustering is an important statistical tool for the analysis of unsupervised data. Spectral clustering and stochastic block models, based on networks and graphs, are well established and widely used for community detection among many clustering algorithms. In this paper we review and discuss important statistical issues in spectral clustering and stochastic block models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Amini, A. A., Chen, A., Bickel, P. J., & Levina, E. (2013). Pseudo-likelihood methods for community detection in large sparse networks. The Annals of Statistics, 41, 2097–2122.

    Article  MathSciNet  Google Scholar 

  • Amini, A. A., & Levina, E. (2018). On semidefinite relaxations for the block model. The Annals of Statistics, 46, 149–179.

    Article  MathSciNet  Google Scholar 

  • Belkin, M., & Niyogi, P. (2002). Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in Neural Information Processing Systems, 15, 585–591.

    Google Scholar 

  • Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15, 1373–1396.

    Article  Google Scholar 

  • Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7, 2399–2434.

    MathSciNet  MATH  Google Scholar 

  • Ben-David, S., von Luxburg, U., & Pal, D. (2006). A sober look on clustering stability. In G. Lugosi & H. Simon (Eds.) Proceedings of the 19th Annual Conference on Learning Theory (COLT), (pp. 5–19). Springer, Berlin.

  • Ben-Hur, A., Elisseeff, A., & Guyon, I. (2002). A stability based method for discovering structure in clustered data. Pacific Symposium on Biocomputing (pp. 6–17).

  • Bickel, P. J., & Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proceedings of the National Academy of Science, USA, 106, 21068–21073.

    Article  Google Scholar 

  • Bolla, M. (1991). Relations between spectral and classification properties of multigraphs. Technical Report No. DIMACS-91-27, Center for Discrete Mathematics and Theoretical Computer Science.

  • Bui, T. N., & Jones, C. (1992). Finding good approximate vertex and edge partitions is NP-hard. Information Processing Letters, 42, 153–159.

    Article  MathSciNet  Google Scholar 

  • Camille, C., Melanie, B., Remy, B., Jean-Michel, L., & Laurent, R. (2020). Robust spectral clustering using LASSO regularization. arXiv:2004.03845

  • Chaudhuri, K., Chung, F., & Tsiatis, A. (2012). Spectral clustering of graphs with general degrees in the extended planted partition model. Journal of Machine Learning Research, 1–23.

  • Chung, F. (1997). Spectral graph theory (Vol. 92 of the CBMS Regional Conference Series in Mathematics). In Conference Board of the Mathematical Sciences, Washington.

  • Donath, W. E., & Hoffman, A. J. (1973). Lower bounds for the partitioning of graphs. IBM Journal of Research & Development, 17, 420–425.

    Article  MathSciNet  Google Scholar 

  • Fang, Y. X., & Wang, J. H. (2012). Selection of the number of clusters via the bootstrap method. Computational Statistics and Data Analysis, 56, 468–477.

    Article  MathSciNet  Google Scholar 

  • Fiedler, M. (1973). Algebraic connectivity of graphs. Czechoslovak Mathematical Journal, 23, 298–305.

    Article  MathSciNet  Google Scholar 

  • Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97, 611–631.

    Article  MathSciNet  Google Scholar 

  • Hagen, L., & Kahng, A. B. (1992). New spectral methods for radio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 11, 1074–1085.

    Article  Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2008). The Elements of Statistical Learning (2nd ed.). New York: Springer.

    MATH  Google Scholar 

  • Holland, P., Laskey, K. B., & Leinhardt, S. (1983). Stochastic blockmodels: Some first steps. Social Networks, 5, 109–137.

    Article  MathSciNet  Google Scholar 

  • Joseph, A., & Yu, B. (2016). Impact of regularization on spectral clustering. The Annals of Statistics, 44, 1765–1791.

    Article  MathSciNet  Google Scholar 

  • Koller, D., & Friedman, N. (2009). Probabilistic graphical models : Principles and techniques. London: The MIT Press.

    MATH  Google Scholar 

  • Lange, T., Roth, V., Braun, M., & Buhmann, J. (2004). Stability-based validation of clustering solutions. Neural Computation, 16, 1299–1323.

    Article  Google Scholar 

  • Le, C. M., & Levina, E. (2015). Estimating the number of components in networks by spectral methods. arXiv:1507.00827

  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444.

    Article  Google Scholar 

  • Mohar, B. (1997). Some applications of Laplace eigenvalues of graphs. In: G. Hahn & G. Sabidussi (Eds.), Graph Symmetry: Algebraic Methods and Applications (vol. NATO ASI Ser. C 497, pp. 225–275). Kluwer.

  • Newman, M. E., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69, 026113.

    Article  Google Scholar 

  • Ng, A., Jordan, M., & Weiss, Y. (2002). On spectral clustering: analysis and an algorithm. In Advances in Neural Information Processing Systems, (pp. 849–856). MIT Press.

  • Nie, F., Zeng, Z., Tsang, I. W., Xu, D., & Zhang, C. (2011). Spectral embedded clustering: A framework for in-sample and out-of-sample spectral clustering. IEEE Transactions on Neural Networks, 22, 1796–1808.

    Article  Google Scholar 

  • Qin, T., & Rohe, K. (2013). Regularized spectral clustering under the degree-corrected stochastic blockmodel. Advances in Neural Information Processing Systems, 3120–3128.

  • Rohe, K., Chatterjee, S., & Yu, B. (2011). Spectral clustering and the high-dimensional stochastic block model. The Annals of Statistics, 39, 1878–1915.

    Article  MathSciNet  Google Scholar 

  • Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 888–905.

    Article  Google Scholar 

  • Tepper, M., Muse, P., Almansa, A., & Mejail, M. (2011). Automatically finding clusters in normalized cuts. Pattern Recognition, 44, 1372–1386.

    Article  Google Scholar 

  • Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a dataset via the gap statistic. Journal of the Royal Statistical Society Series B, 63, 411–423.

    Article  MathSciNet  Google Scholar 

  • von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17, 395–416.

    Article  MathSciNet  Google Scholar 

  • Wagner, D., & Wagner, F. (1993). Between min cut and graph bisection. In: Proceedings of the 18th International Symposium on Mathematical Foundations of Computer Science, (pp. 744–750). London: Springer.

  • Wang, J. H. (2010). Consistent selection of the number of clusters via cross validation. Biometrika, 97, 893–904.

    Article  MathSciNet  Google Scholar 

  • Wang, Q., Qin, Z., Nie, F., & Li, X. (2019). Spectral embedded adaptive neighbors clustering. IEEE Transactions on Neural Networks and Learning Systems, 30, 1265–1271.

    Article  Google Scholar 

  • Zelnik-Manor, L., & Perona, P. (2005). Self-tuning spectral clustering, advances in neural information processing systems, 1601–1608. Cambridge: MIT Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Choongrak Kim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No.2019R1A2C1007193).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Baek, M., Kim, C. A review on spectral clustering and stochastic block models. J. Korean Stat. Soc. 50, 818–831 (2021). https://doi.org/10.1007/s42952-021-00112-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42952-021-00112-w

Keywords

Navigation