Skip to main content
Log in

The spherical k-means++ algorithm via local search scheme

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

The spherical k-means problem (SKMP) is an important variant of the k-means clustering problem (KMP). In this paper, we consider the SKMP, which aims to divide the n points in a given data point set \({\mathcal {S}}\) into k clusters so as to minimize the total sum of the cosine dissimilarity measure from each data point to their respective closest cluster center. Our main contribution is to design an expected constant approximation algorithm for the SKMP by integrating the seeding algorithm for the KMP and the local search technique. By utilizing the structure of the clusters, we further obtain an improved LocalSearch++ algorithm involving \(\varepsilon k\) local search steps.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2017) Better guarantees for \(k\)-means and euclidean \(k\)-median by primal-dual algorithms. In: Proceedings of the 58th annual symposium on foundations of computer science (FOCS), pp 61–72

  • Arthur D, Vassilvitskii S (2007) \(k\)-means++: the advantages of careful seeding. In: Proceedings of the 18th annual ACM-SIAM symposium on discrete algorithms (SODA), pp 1027–1035

  • Choo D, Grunau C, Portmann J, Rozhon V (2020) \(k\)-means++: few more steps yield constant approximation. In: Proceedings of the 37th international conference on machine learning (ICML)

  • Dhillon I, Modha D (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42(1):143–175

    Article  Google Scholar 

  • Endo Y, Miyamoto S (2015) Spherical \(k\)-means++ clustering. In: Proceedings of the 14th modeling decisions for artificial intelligence (MDAI), pp 103–114

  • Feldman D, Schmidt M, Sohler C (2020) Turning big data into tiny data: constant-size coresets for \(k\)-means. SIAM J Comput 49(3):601–657

    Article  MathSciNet  Google Scholar 

  • Hornik K, Feinerer I, Kober M, Buchta C (2012) Spherical \(k\)-means clustering. J Stat Softw 50(10):1–22

    Article  Google Scholar 

  • Jain A, Dubes R (1988) Algorithms for clustering data. Technometrics 32(2):227–229

    MATH  Google Scholar 

  • Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  • Kanungo T, Mount D, Netanyahu N, Piatko C, Silverman R, Wu A (2004) A local search approximation algorithm for \(k\)-means clustering. Comput Geom 28(2–3):89–112

    Article  MathSciNet  Google Scholar 

  • Kanungo T, Mount D, Netanyahu N, Piatko C, Silverman R, Wu A (2002) An efficient \(k\)-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892

    Article  Google Scholar 

  • Kumar A, Sabharwal Y, Sen S (2004) A simple linear time \((1+\varepsilon )\)-approximation algorithm for \(k\)-means clustering in any dimensions. In: Proceedings of the 32nd foundations of computer science (FOCS), pp 454–462

  • Lattanzi S, Sohler C (2019) A better \(k\)-means++ algorithm via local search. In: Proceedings of the 36th international conference on machine learning (ICML), pp 3662–3671

  • Li M, Xu D, Zhang D, Zou J (2020) The seeding algorithms for spherical \(k\)-means clustering. J Global Optim 76(4):695–708

    Article  MathSciNet  Google Scholar 

  • Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137

    Article  MathSciNet  Google Scholar 

  • MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th berkeley symposium on mathematical statistics and probability and statistics, pp 281–297

  • Tunali V, Bilgin T, Camurcu A (2016) An improved clustering algorithm for text mining: multi-cluster spherical \(k\)-means. Int Arab J Inf Technol 13(1):12–19

    Google Scholar 

  • Zhang D, Cheng Y, Li M, Wang Y, Xu D (2020) Local search approximation algorithms for the spherical \(k\)-means problem. Theor Comput Sci. https://doi.org/10.1016/j.tcs.2020.06.029

    Article  Google Scholar 

Download references

Acknowledgements

The first two authors are supported by National Natural Science Foundation of China (No. 11871081) and Beijing Natural Science Foundation Project No. Z200002. The third author is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Grant 06446, and National Natural Science Foundation of China (Nos. 11771386, 11728104). The fourth author is supported by National Natural Science Foundation of China (No. 11201333).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ling Gai.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version of this paper appeared in Proceedings of the 14th International Conference on Algorithmic Aspects in Information and Management, pp. 131–140, 2020.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, X., Xu, D., Du, D. et al. The spherical k-means++ algorithm via local search scheme. J Comb Optim 44, 2375–2394 (2022). https://doi.org/10.1007/s10878-021-00737-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-021-00737-x

Keywords

Navigation