Abstract
The spherical k-means problem (SKMP) is an important variant of the k-means clustering problem (KMP). In this paper, we consider the SKMP, which aims to divide the n points in a given data point set \({\mathcal {S}}\) into k clusters so as to minimize the total sum of the cosine dissimilarity measure from each data point to their respective closest cluster center. Our main contribution is to design an expected constant approximation algorithm for the SKMP by integrating the seeding algorithm for the KMP and the local search technique. By utilizing the structure of the clusters, we further obtain an improved LocalSearch++ algorithm involving \(\varepsilon k\) local search steps.
Similar content being viewed by others
References
Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2017) Better guarantees for \(k\)-means and euclidean \(k\)-median by primal-dual algorithms. In: Proceedings of the 58th annual symposium on foundations of computer science (FOCS), pp 61–72
Arthur D, Vassilvitskii S (2007) \(k\)-means++: the advantages of careful seeding. In: Proceedings of the 18th annual ACM-SIAM symposium on discrete algorithms (SODA), pp 1027–1035
Choo D, Grunau C, Portmann J, Rozhon V (2020) \(k\)-means++: few more steps yield constant approximation. In: Proceedings of the 37th international conference on machine learning (ICML)
Dhillon I, Modha D (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42(1):143–175
Endo Y, Miyamoto S (2015) Spherical \(k\)-means++ clustering. In: Proceedings of the 14th modeling decisions for artificial intelligence (MDAI), pp 103–114
Feldman D, Schmidt M, Sohler C (2020) Turning big data into tiny data: constant-size coresets for \(k\)-means. SIAM J Comput 49(3):601–657
Hornik K, Feinerer I, Kober M, Buchta C (2012) Spherical \(k\)-means clustering. J Stat Softw 50(10):1–22
Jain A, Dubes R (1988) Algorithms for clustering data. Technometrics 32(2):227–229
Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Kanungo T, Mount D, Netanyahu N, Piatko C, Silverman R, Wu A (2004) A local search approximation algorithm for \(k\)-means clustering. Comput Geom 28(2–3):89–112
Kanungo T, Mount D, Netanyahu N, Piatko C, Silverman R, Wu A (2002) An efficient \(k\)-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892
Kumar A, Sabharwal Y, Sen S (2004) A simple linear time \((1+\varepsilon )\)-approximation algorithm for \(k\)-means clustering in any dimensions. In: Proceedings of the 32nd foundations of computer science (FOCS), pp 454–462
Lattanzi S, Sohler C (2019) A better \(k\)-means++ algorithm via local search. In: Proceedings of the 36th international conference on machine learning (ICML), pp 3662–3671
Li M, Xu D, Zhang D, Zou J (2020) The seeding algorithms for spherical \(k\)-means clustering. J Global Optim 76(4):695–708
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th berkeley symposium on mathematical statistics and probability and statistics, pp 281–297
Tunali V, Bilgin T, Camurcu A (2016) An improved clustering algorithm for text mining: multi-cluster spherical \(k\)-means. Int Arab J Inf Technol 13(1):12–19
Zhang D, Cheng Y, Li M, Wang Y, Xu D (2020) Local search approximation algorithms for the spherical \(k\)-means problem. Theor Comput Sci. https://doi.org/10.1016/j.tcs.2020.06.029
Acknowledgements
The first two authors are supported by National Natural Science Foundation of China (No. 11871081) and Beijing Natural Science Foundation Project No. Z200002. The third author is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Grant 06446, and National Natural Science Foundation of China (Nos. 11771386, 11728104). The fourth author is supported by National Natural Science Foundation of China (No. 11201333).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A preliminary version of this paper appeared in Proceedings of the 14th International Conference on Algorithmic Aspects in Information and Management, pp. 131–140, 2020.
Rights and permissions
About this article
Cite this article
Tian, X., Xu, D., Du, D. et al. The spherical k-means++ algorithm via local search scheme. J Comb Optim 44, 2375–2394 (2022). https://doi.org/10.1007/s10878-021-00737-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-021-00737-x