Skip to main content
Log in

Certifying Global Optimality of Graph Cuts via Semidefinite Relaxation: A Performance Guarantee for Spectral Clustering

  • Published:
Foundations of Computational Mathematics Aims and scope Submit manuscript

Abstract

Spectral clustering has become one of the most widely used clustering techniques when the structure of the individual clusters is non-convex or highly anisotropic. Yet, despite its immense popularity, there exists fairly little theory about performance guarantees for spectral clustering. This issue is partly due to the fact that spectral clustering typically involves two steps which complicated its theoretical analysis: First, the eigenvectors of the associated graph Laplacian are used to embed the dataset, and second, k-means clustering algorithm is applied to the embedded dataset to get the labels. This paper is devoted to the theoretical foundations of spectral clustering and graph cuts. We consider a convex relaxation of graph cuts, namely ratio cuts and normalized cuts, that makes the usual two-step approach of spectral clustering obsolete and at the same time gives rise to a rigorous theoretical analysis of graph cuts and spectral clustering. We derive deterministic bounds for successful spectral clustering via a spectral proximity condition that naturally depends on the algebraic connectivity of each cluster and the inter-cluster connectivity. Moreover, we demonstrate by means of some popular examples that our bounds can achieve near optimality. Our findings are also fundamental to the theoretical understanding of kernel k-means. Numerical simulations confirm and complement our analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. The eigengap refers to the difference between the first and the second largest eigenvalues of the Markov transition matrix.

  2. It is clear that in the given example simple rescaling of the data would make them more isotropic, but this is not the point we try to illustrate. Also, in more involved examples consisting of anisotropic clusters of different orientation, rescaling or resorting, e.g., to the Mahalanobis distance instead of the Euclidean distance will not really overcome the sensibility of k-means to “geometric distortions.”

  3. Surface tension is defined as \(\epsilon _{\varPhi } = \int _{R^m}|z^{(1)}|^2\varPhi (z) \mathrm {d}z \), where \(z^{(1)}\) is the first component of z.

  4. Here, the Goemans–Williamson type of SDP relaxation is given by \(\max \text {Tr}( (2W - (1_N1_N^{\top } -I_N ))Z )\), s.t. \(Z\succeq 0\) and \(Z_{ii} = 1\). Note this relaxation is designed specifically for the case of two clusters.

  5. The dual cone \({\mathcal {K}}^*\) of \({\mathcal {K}}\) is defined as \(\{W : \langle W, Z\rangle \ge 0, \forall Z\in {\mathcal {K}}\}\); in particular, \(({\mathcal {K}}^*)^* = {\mathcal {K}}\) holds.

  6. The cone \({\mathcal {K}}\) is pointed if for \(Z\in {\mathcal {K}}\) and \(-Z\in {\mathcal {K}}\), Z must be 0, see Chapter 2 in [14].

References

  1. E. Abbe. Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1):6446–6531, 2017.

    MathSciNet  Google Scholar 

  2. E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in the stochastic block model. IEEE Transactions on Information Theory, 62(1):471–487, 2016.

    Article  MathSciNet  MATH  Google Scholar 

  3. N. Agarwal, A. S. Bandeira, K. Koiliaris, and A. Kolla. Multisection in the stochastic block model using semidefinite programming. In Compressed Sensing and its Applications, pages 125–162. Springer, 2017.

  4. D. Aloise, A. Deshpande, P. Hansen, and P. Popat. NP-hardness of Euclidean sum-of-squares clustering. Machine learning, 75(2):245–248, 2009.

    Article  MATH  Google Scholar 

  5. A. A. Amini and E. Levina. On semidefinite relaxations for the block model. The Annals of Statistics, 46(1):149–179, 2018.

    Article  MathSciNet  MATH  Google Scholar 

  6. S. Arora, S. Rao, and U. Vazirani. Expander flows, geometric embeddings and graph partitioning. Journal of the ACM (JACM), 56(2):5, 2009.

    Article  MathSciNet  MATH  Google Scholar 

  7. D. Arthur and S. Vassilvitskii. k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1027–1035. Society for Industrial and Applied Mathematics, 2007.

  8. P. Awasthi, A. S. Bandeira, M. Charikar, R. Krishnaswamy, S. Villar, and R. Ward. Relax, no need to round: Integrality of clustering formulations. In Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, pages 191–200. ACM, 2015.

  9. P. Awasthi and O. Sheffet. Improved spectral-norm bounds for clustering. In APPROX-RANDOM, pages 37–49. Springer, 2012.

  10. A. S. Bandeira. Random laplacian matrices and convex relaxations. Foundations of Computational Mathematics, 18(2):345–379, Apr 2018.

  11. M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in Neural Information Processing Systems, pages 585–591, 2002.

  12. M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, 2003.

    Article  MATH  Google Scholar 

  13. M. Belkin and P. Niyogi. Towards a theoretical foundation for Laplacian-based manifold methods. In International Conference on Computational Learning Theory, pages 486–500. Springer, 2005.

  14. A. Ben-Tal and A. Nemirovski. Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. SIAM, 2001.

  15. J. A. Bondy, U. S. R. Murty, et al. Graph Theory with Applications, volume 290. Macmillan London, 1976.

  16. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.

  17. A. E. Brouwer and W. H. Haemers. Spectra of Graphs. Springer Science+Business Media, 2011.

  18. F. R. Chung. Spectral Graph Theory, volume 92. American Mathematical Society, 1997.

  19. R. R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21(1):5–30, 2006.

    Article  MathSciNet  MATH  Google Scholar 

  20. R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W. Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proceedings of the National Academy of Sciences of the United States of America, 102(21):7426–7431, 2005.

    Article  MATH  Google Scholar 

  21. C. Davis and W. M. Kahan. The rotation of eigenvectors by a perturbation. iii. SIAM Journal on Numerical Analysis, 7(1):1–46, 1970.

    Article  MathSciNet  MATH  Google Scholar 

  22. I. S. Dhillon, Y. Guan, and B. Kulis. Kernel k-means: spectral clustering and normalized cuts. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 551–556. ACM, 2004.

  23. M. P. Do Carmo. Riemannian Geometry. Birkhauser, 1992.

  24. G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 3rd edition, 1996.

  25. T. H. Grönwall. Note on the derivatives with respect to a parameter of the solutions of a system of differential equations. Annals of Mathematics, pages 292–296, 1919.

  26. L. Hagen and A. B. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 11(9):1074–1085, 1992.

    Article  Google Scholar 

  27. T. Hastie, R. Tibshirani, and J. Friedman. Unsupervised learning. In The Elements of Statistical Learning, pages 485–585. Springer, 2009.

  28. T. Iguchi, D. G. Mixon, J. Peterson, and S. Villar. Probably certifiably correct k-means clustering. Mathematical Programming, 165(2):605–642, 2017.

    Article  MathSciNet  MATH  Google Scholar 

  29. A. K. Jain. Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8):651–666, 2010.

    Article  Google Scholar 

  30. A. Kumar and R. Kannan. Clustering with spectral norm and the k-means algorithm. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, pages 299–308. IEEE, 2010.

  31. J. Lei and A. Rinaldo. Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43(1):215–237, 2015.

    Article  MathSciNet  MATH  Google Scholar 

  32. D. A. Levin, Y. Peres, and E. L. Wilmer. Markov Chains and Mixing Times, volume 107. American Mathematical Society, 2017.

  33. X. Li, Y. Li, S. Ling, T. Strohmer, and K. Wei. When do birds of a feather flock together? k-means, proximity, and conic programming. Mathematical Programming, pages 1–47, 2018.

  34. S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129–137, 1982.

    Article  MathSciNet  MATH  Google Scholar 

  35. M. Mahajan, P. Nimbhorkar, and K. Varadarajan. The planar k-means problem is NP-hard. In International Workshop on Algorithms and Computation, pages 274–285. Springer, 2009.

  36. D. G. Mixon, S. Villar, and R. Ward. Clustering subgaussian mixtures by semidefinite programming. Information and Inference: A Journal of the IMA, 6(4):389–415, 2017.

    Article  MathSciNet  MATH  Google Scholar 

  37. A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: analysis and an algorithm. In Advances in Neural Information Processing Systems, pages 849–856, 2002.

  38. J. Peng and Y. Wei. Approximating k-means-type clustering via semidefinite programming. SIAM Journal on Optimization, 18(1):186–205, 2007.

    Article  MathSciNet  MATH  Google Scholar 

  39. K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, pages 1878–1915, 2011.

  40. J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.

    Article  Google Scholar 

  41. A. Singer. From graph to manifold Laplacian: The convergence rate. Applied and Computational Harmonic Analysis, 21(1):128–134, 2006.

    Article  MathSciNet  MATH  Google Scholar 

  42. A. Singer and H.-T. Wu. Spectral convergence of the connection Laplacian from random samples. Information and Inference: A Journal of the IMA, 6(1):58–123, 2016.

    MathSciNet  MATH  Google Scholar 

  43. G. W. Stewart. Perturbation theory for the singular value decomposition. Technical Report CS-TR-2539, University of Maryland, Sep 1990.

  44. M. Tepper, A. M. Sengupta, and D. Chklovskii. Clustering is semidefinitely not that hard: Nonnegative sdp for manifold disentangling. The Journal of Machine Learning Research, 19(1):3208–3237, 2018.

    MathSciNet  MATH  Google Scholar 

  45. N. G. Trillos, M. Gerlach, M. Hein, and D. Slepcev. Error estimates for spectral convergence of the graph Laplacian on random geometric graphs towards the Laplace-Beltrami operator. arXiv preprint arXiv:1801.10108, 2018.

  46. N. G. Trillos and D. Slepčev. A variational approach to the consistency of spectral clustering. Applied and Computational Harmonic Analysis, 45(2):239–281, 2018.

    Article  MathSciNet  MATH  Google Scholar 

  47. J. A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, 12(4):389–434, 2012.

    Article  MathSciNet  MATH  Google Scholar 

  48. R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Y. C. Eldar and G. Kutyniok, editors, Compressed Sensing: Theory and Applications, chapter 5. Cambridge University Press, 2012.

  49. U. Von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 2007.

    Article  MathSciNet  Google Scholar 

  50. U. Von Luxburg, M. Belkin, and O. Bousquet. Consistency of spectral clustering. The Annals of Statistics, pages 555–586, 2008.

  51. D. Wagner and F. Wagner. Between min cut and graph bisection. In International Symposium on Mathematical Foundations of Computer Science, pages 744–750. Springer, 1993.

  52. W. Walter. Ordinary Differential Equations, volume 1(182). Springer Science and Media, 1998.

  53. E. P. Xing and M. I. Jordan. On semidefinite relaxation for normalized k-cut and connections to spectral clustering. Technical Report UCB/CSD-03-1265, EECS Department, University of California, Berkeley, Jun 2003.

  54. B. Yan, P. Sarkar, and X. Cheng. Provable estimation of the number of blocks in block models. In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, pages 1185–1194. PMLR, 09–11 Apr 2018.

Download references

Acknowledgements

S.L. thanks Afonso S. Bandeira for fruitful discussions about stochastic block models. The authors are also grateful to the anonymous referees for their careful reading of this paper and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuyang Ling.

Additional information

Communicated by James Renegar.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

T. Strohmer acknowledges partial support from the NSF via Grants DMS 1620455 and DMS 1737943.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ling, S., Strohmer, T. Certifying Global Optimality of Graph Cuts via Semidefinite Relaxation: A Performance Guarantee for Spectral Clustering. Found Comput Math 20, 367–421 (2020). https://doi.org/10.1007/s10208-019-09421-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10208-019-09421-3

Keywords

Mathematics Subject Classification

Navigation