Abstract
Spectral clustering has become one of the most widely used clustering techniques when the structure of the individual clusters is non-convex or highly anisotropic. Yet, despite its immense popularity, there exists fairly little theory about performance guarantees for spectral clustering. This issue is partly due to the fact that spectral clustering typically involves two steps which complicated its theoretical analysis: First, the eigenvectors of the associated graph Laplacian are used to embed the dataset, and second, k-means clustering algorithm is applied to the embedded dataset to get the labels. This paper is devoted to the theoretical foundations of spectral clustering and graph cuts. We consider a convex relaxation of graph cuts, namely ratio cuts and normalized cuts, that makes the usual two-step approach of spectral clustering obsolete and at the same time gives rise to a rigorous theoretical analysis of graph cuts and spectral clustering. We derive deterministic bounds for successful spectral clustering via a spectral proximity condition that naturally depends on the algebraic connectivity of each cluster and the inter-cluster connectivity. Moreover, we demonstrate by means of some popular examples that our bounds can achieve near optimality. Our findings are also fundamental to the theoretical understanding of kernel k-means. Numerical simulations confirm and complement our analysis.
Similar content being viewed by others
Notes
The eigengap refers to the difference between the first and the second largest eigenvalues of the Markov transition matrix.
It is clear that in the given example simple rescaling of the data would make them more isotropic, but this is not the point we try to illustrate. Also, in more involved examples consisting of anisotropic clusters of different orientation, rescaling or resorting, e.g., to the Mahalanobis distance instead of the Euclidean distance will not really overcome the sensibility of k-means to “geometric distortions.”
Surface tension is defined as \(\epsilon _{\varPhi } = \int _{R^m}|z^{(1)}|^2\varPhi (z) \mathrm {d}z \), where \(z^{(1)}\) is the first component of z.
Here, the Goemans–Williamson type of SDP relaxation is given by \(\max \text {Tr}( (2W - (1_N1_N^{\top } -I_N ))Z )\), s.t. \(Z\succeq 0\) and \(Z_{ii} = 1\). Note this relaxation is designed specifically for the case of two clusters.
The dual cone \({\mathcal {K}}^*\) of \({\mathcal {K}}\) is defined as \(\{W : \langle W, Z\rangle \ge 0, \forall Z\in {\mathcal {K}}\}\); in particular, \(({\mathcal {K}}^*)^* = {\mathcal {K}}\) holds.
The cone \({\mathcal {K}}\) is pointed if for \(Z\in {\mathcal {K}}\) and \(-Z\in {\mathcal {K}}\), Z must be 0, see Chapter 2 in [14].
References
E. Abbe. Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1):6446–6531, 2017.
E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in the stochastic block model. IEEE Transactions on Information Theory, 62(1):471–487, 2016.
N. Agarwal, A. S. Bandeira, K. Koiliaris, and A. Kolla. Multisection in the stochastic block model using semidefinite programming. In Compressed Sensing and its Applications, pages 125–162. Springer, 2017.
D. Aloise, A. Deshpande, P. Hansen, and P. Popat. NP-hardness of Euclidean sum-of-squares clustering. Machine learning, 75(2):245–248, 2009.
A. A. Amini and E. Levina. On semidefinite relaxations for the block model. The Annals of Statistics, 46(1):149–179, 2018.
S. Arora, S. Rao, and U. Vazirani. Expander flows, geometric embeddings and graph partitioning. Journal of the ACM (JACM), 56(2):5, 2009.
D. Arthur and S. Vassilvitskii. k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1027–1035. Society for Industrial and Applied Mathematics, 2007.
P. Awasthi, A. S. Bandeira, M. Charikar, R. Krishnaswamy, S. Villar, and R. Ward. Relax, no need to round: Integrality of clustering formulations. In Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, pages 191–200. ACM, 2015.
P. Awasthi and O. Sheffet. Improved spectral-norm bounds for clustering. In APPROX-RANDOM, pages 37–49. Springer, 2012.
A. S. Bandeira. Random laplacian matrices and convex relaxations. Foundations of Computational Mathematics, 18(2):345–379, Apr 2018.
M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in Neural Information Processing Systems, pages 585–591, 2002.
M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, 2003.
M. Belkin and P. Niyogi. Towards a theoretical foundation for Laplacian-based manifold methods. In International Conference on Computational Learning Theory, pages 486–500. Springer, 2005.
A. Ben-Tal and A. Nemirovski. Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. SIAM, 2001.
J. A. Bondy, U. S. R. Murty, et al. Graph Theory with Applications, volume 290. Macmillan London, 1976.
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
A. E. Brouwer and W. H. Haemers. Spectra of Graphs. Springer Science+Business Media, 2011.
F. R. Chung. Spectral Graph Theory, volume 92. American Mathematical Society, 1997.
R. R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21(1):5–30, 2006.
R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W. Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proceedings of the National Academy of Sciences of the United States of America, 102(21):7426–7431, 2005.
C. Davis and W. M. Kahan. The rotation of eigenvectors by a perturbation. iii. SIAM Journal on Numerical Analysis, 7(1):1–46, 1970.
I. S. Dhillon, Y. Guan, and B. Kulis. Kernel k-means: spectral clustering and normalized cuts. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 551–556. ACM, 2004.
M. P. Do Carmo. Riemannian Geometry. Birkhauser, 1992.
G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 3rd edition, 1996.
T. H. Grönwall. Note on the derivatives with respect to a parameter of the solutions of a system of differential equations. Annals of Mathematics, pages 292–296, 1919.
L. Hagen and A. B. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 11(9):1074–1085, 1992.
T. Hastie, R. Tibshirani, and J. Friedman. Unsupervised learning. In The Elements of Statistical Learning, pages 485–585. Springer, 2009.
T. Iguchi, D. G. Mixon, J. Peterson, and S. Villar. Probably certifiably correct k-means clustering. Mathematical Programming, 165(2):605–642, 2017.
A. K. Jain. Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8):651–666, 2010.
A. Kumar and R. Kannan. Clustering with spectral norm and the k-means algorithm. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, pages 299–308. IEEE, 2010.
J. Lei and A. Rinaldo. Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43(1):215–237, 2015.
D. A. Levin, Y. Peres, and E. L. Wilmer. Markov Chains and Mixing Times, volume 107. American Mathematical Society, 2017.
X. Li, Y. Li, S. Ling, T. Strohmer, and K. Wei. When do birds of a feather flock together? k-means, proximity, and conic programming. Mathematical Programming, pages 1–47, 2018.
S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129–137, 1982.
M. Mahajan, P. Nimbhorkar, and K. Varadarajan. The planar k-means problem is NP-hard. In International Workshop on Algorithms and Computation, pages 274–285. Springer, 2009.
D. G. Mixon, S. Villar, and R. Ward. Clustering subgaussian mixtures by semidefinite programming. Information and Inference: A Journal of the IMA, 6(4):389–415, 2017.
A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: analysis and an algorithm. In Advances in Neural Information Processing Systems, pages 849–856, 2002.
J. Peng and Y. Wei. Approximating k-means-type clustering via semidefinite programming. SIAM Journal on Optimization, 18(1):186–205, 2007.
K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, pages 1878–1915, 2011.
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.
A. Singer. From graph to manifold Laplacian: The convergence rate. Applied and Computational Harmonic Analysis, 21(1):128–134, 2006.
A. Singer and H.-T. Wu. Spectral convergence of the connection Laplacian from random samples. Information and Inference: A Journal of the IMA, 6(1):58–123, 2016.
G. W. Stewart. Perturbation theory for the singular value decomposition. Technical Report CS-TR-2539, University of Maryland, Sep 1990.
M. Tepper, A. M. Sengupta, and D. Chklovskii. Clustering is semidefinitely not that hard: Nonnegative sdp for manifold disentangling. The Journal of Machine Learning Research, 19(1):3208–3237, 2018.
N. G. Trillos, M. Gerlach, M. Hein, and D. Slepcev. Error estimates for spectral convergence of the graph Laplacian on random geometric graphs towards the Laplace-Beltrami operator. arXiv preprint arXiv:1801.10108, 2018.
N. G. Trillos and D. Slepčev. A variational approach to the consistency of spectral clustering. Applied and Computational Harmonic Analysis, 45(2):239–281, 2018.
J. A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, 12(4):389–434, 2012.
R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Y. C. Eldar and G. Kutyniok, editors, Compressed Sensing: Theory and Applications, chapter 5. Cambridge University Press, 2012.
U. Von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 2007.
U. Von Luxburg, M. Belkin, and O. Bousquet. Consistency of spectral clustering. The Annals of Statistics, pages 555–586, 2008.
D. Wagner and F. Wagner. Between min cut and graph bisection. In International Symposium on Mathematical Foundations of Computer Science, pages 744–750. Springer, 1993.
W. Walter. Ordinary Differential Equations, volume 1(182). Springer Science and Media, 1998.
E. P. Xing and M. I. Jordan. On semidefinite relaxation for normalized k-cut and connections to spectral clustering. Technical Report UCB/CSD-03-1265, EECS Department, University of California, Berkeley, Jun 2003.
B. Yan, P. Sarkar, and X. Cheng. Provable estimation of the number of blocks in block models. In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, pages 1185–1194. PMLR, 09–11 Apr 2018.
Acknowledgements
S.L. thanks Afonso S. Bandeira for fruitful discussions about stochastic block models. The authors are also grateful to the anonymous referees for their careful reading of this paper and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by James Renegar.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
T. Strohmer acknowledges partial support from the NSF via Grants DMS 1620455 and DMS 1737943.
Rights and permissions
About this article
Cite this article
Ling, S., Strohmer, T. Certifying Global Optimality of Graph Cuts via Semidefinite Relaxation: A Performance Guarantee for Spectral Clustering. Found Comput Math 20, 367–421 (2020). https://doi.org/10.1007/s10208-019-09421-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10208-019-09421-3
Keywords
- Semidefinite programming
- Graph partition
- Unsupervised learning
- Spectral clustering
- Community detection
- Graph Laplacian