Skip to main content
Log in

Exact Algorithms of Search for a Cluster of the Largest Size in Two Integer 2-Clustering Problems

  • Published:
Numerical Analysis and Applications Aims and scope Submit manuscript

Abstract

We consider two related discrete optimization problems of searching for a subset in a finite set of points in Euclidean space. Both problems are induced by versions of a fundamental problem in data analysis, namely, that of selecting a subset of similar elements in a set of objects. In each problem, given an input set and a positive real number, it is required to find a cluster (i.e., a subset) of the largest size under constraints on a quadratic clusterization function. The points in the input set, which are outside the sought-for subset, are treated as a second (complementary) cluster. In the first problem, the function under the constraint is the sum over both clusters of the intracluster sums of the squared distances between the elements of the clusters and their centers. The center of the first (i.e., the sought-for) cluster is unknown and determined as a centroid, while the center of the second one is fixed at a given point in Euclidean space (without loss of generality, at the origin of coordinates). In the second problem, the function under the constraint is the sum over both clusters of the weighted intracluster sums of the squared distances between the elements of the clusters and their centers. As in the first problem, the center of the first cluster is unknown and determined as a centroid, while the center of the second one is fixed at the origin of coordinates. In this paper, we show that both problems are strongly NP-hard. Also, we present exact algorithms for the problems in which the input points have integer components. If the space dimension is bounded by some constant, the algorithms are pseudopolynomial.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. MacQueen, J.B., Some Methods for Classification and Analysis of Multivariate Observations, Proc. of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley: Univ. of California Press, 1967, vol. 1, pp. 281–297.

    MathSciNet  MATH  Google Scholar 

  2. Rao, M., atCluster Analysis and Mathematical Programming, J.Am. Stat. Assoc, 1971, vol. 66, pp. 622–626.

    Article  Google Scholar 

  3. Hansen, P., Jaumard, B., and Mladenovich, N., Minimum Sum of Squares Clustering in a Low Dimensional Space, J. Classific, 1998, vol. 15, pp. 37–55.

    Article  MathSciNet  MATH  Google Scholar 

  4. Hansen, P. and Jaumard, B., Cluster Analysis and Mathematical Programming, Math. Programming, 1997, vol. 79, pp. 191–215.

    MathSciNet  MATH  Google Scholar 

  5. Fisher, R.A., Statistical Methods and Scientific Inference, New York: Hafner, 1956.

    MATH  Google Scholar 

  6. Jain, A.K., Data Clustering: 50 Years beyond A;-Means, Patt. Recog. Lett., 2010, vol. 31, iss. 8, pp. 651–666.

    Article  Google Scholar 

  7. Aloise, D., Deshpande, A., Hansen, P., and Popat, P., NP-Hardness of Euclidean Sum-of-Squares Clustering, Mach. Learning, 2009, vol. 75, iss. 2, pp. 245–248.

    Article  MATH  Google Scholar 

  8. Drineas, P., Frieze, A., Kannan, R., Vempala, S., and Vinay, V. Clustering Large Graphs via the Singular Value Decomposition, Mach. Learning, 2004, vol. 56, pp. 9–33.

    Article  MATH  Google Scholar 

  9. Dolgushev, A.V. and Kel'manov, A.V., On the Algorithmic Complexity of a Problem in Cluster Analysis, J. Appl. Ind. Math., 2011, vol. 2011, no. 5, pp. 2–191.

    Google Scholar 

  10. Mahajan, M., Nimbhorkar, P., and Varadarajan, K., The Planar k;-Means Problem Is NP-Hard, Theor. Comp. Sci., 2012, vol. 442, pp. 3–21.

    Article  MathSciNet  MATH  Google Scholar 

  11. Brucker, P., On the Complexity of Clustering Problems, Led. Not. Econ. Math. Systems, 1978, vol. 157, pp. 45–54.

    Article  MathSciNet  MATH  Google Scholar 

  12. Bern, M. and Eppstein, D., Approximation Algorithms for Geometric Problems, in Approximation Algorithms for NP-Hard Problems, Boston: PWS, 1997, pp. 296–345.

    Google Scholar 

  13. Indyk, P., A Sublinear Time Approximation Scheme for Clustering in Metric Space, Proc. of the 40th Ann. IEEE Symp. on Foundations of Computer Science (FOCS), 1999, pp. 154–159.

    Google Scholar 

  14. De la Vega, F. and Kenyon, C., A Randomized Approximation Scheme for Metric Max-Cut, I. Comput. Syst. Set., 2001, vol. 63, pp. 531–541.

    Article  MathSciNet  MATH  Google Scholar 

  15. De la Vega, E. Karpinski, M., Kenyon, C., and Rabani, Y., Polynomial Time Approximation Schemes for Metric Min-Sum Clustering, Electronic Colloquium on Computational Complexity (ECCC), 2002, rep. no. 25.

  16. Hasegawa, S., Imai, E., Inaba, M., Katoh, N., and Nakano, J., Efficient Algorithms for Variance-Based fc-Clustering, Proc. of the 1st Pacific Conference on Computer Graphics and Applications (Pacific Graphics'93, Seoul, K.rea), River Edge, N.: World Scientific, 1993, vol. 1, pp. 75–89.

    Google Scholar 

  17. Inaba, M., Katoh, N., and Imai, E., Applications of Weighted Voronoi Diagrams and Randomization to Variance-Based k;-Clustering, SCG'94 Proc. of the Tenth Annual Symposium on Computational Geometry, Stony Brook, N., USA, 1994, pp. 332–339.

    Chapter  Google Scholar 

  18. Sahni, S. and Gonzalez, T. P-Complete Approximation Problems, I. ACM, 1976, vol. 23, pp. 555–566.

    MathSciNet  MATH  Google Scholar 

  19. Ageev, A.A., Kel'manov, A.V., and Pyatkin, A.V., NP-Hardness of the Euclidean Max-Cut Problem, Dokl. Math., 2014, vol. 2014, no. 89, pp. 3–343.

    MATH  Google Scholar 

  20. Ageev, A.A., Kel'manov, A.V., and Pyatkin, A.V., Complexity of the Weighted Max-Cut in Euclidean Space, 7. Appl. Ind. Math., 2014, vol. 2014, no. 8, pp. 4–453.

    MATH  Google Scholar 

  21. Kel'manov, A.V. and Pyatkin, A.V., On the Complexity of a Search for a Subset of “Similar” Vectors, Dokl. Math., 2008, vol. 78, no. 1, pp. 574/575.

    MATH  Google Scholar 

  22. Kel'manov, A.V. and Pyatkin, A.V, On a Version of the Problem of Choosing a Vector Subset, I. Appl. Ind. Math., 2009, vol. 2009, no. 3, pp. 4–447.

    Google Scholar 

  23. Kel'manov, A.V. and Pyatkin, A.V., NP-Hardness of Some Quadratic Euclidean 2-Clustering Problems, Dokl. Math., 2015, vol. 2015, no. 92, pp. 2–634.

    MathSciNet  MATH  Google Scholar 

  24. Kel'manov, A.V. and Pyatkin, A.V, On the Complexity of Some Quadratic Euclidean 2-Clustering Problems, Comput. Math. Math. Phys., 2016, vol. 2016, no. 56, pp. 3–491.

    MathSciNet  MATH  Google Scholar 

  25. Bishop, C.M., Pattern Recognition and Machine Learning, New York: Springer, 2006.

    MATH  Google Scholar 

  26. James, G., Witten, D., Hastie, T., and Tibshirani, R., An Introduction to Statistical Learning, New York: Springer, 2013.

    Book  MATH  Google Scholar 

  27. Hastie, T. Tibshirani, R., and Friedman, J., The Elements of Statistical Learning, 2nd ed., Springer, 2009.

    Book  MATH  Google Scholar 

  28. Aggarwal, C.C., Data Mining: The Textbook, Springer, 2015.

    MATH  Google Scholar 

  29. Goodfellow, I., Bengio, Y. and Courville, A., Deep Learning (Adaptive Computation and Machine Learning Series), MIT Press, 2017.

    MATH  Google Scholar 

  30. Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., and Herawan, T. Big Data Clustering: A Review, LNCS, 2014, vol. 8583, pp. 707–720.

    Google Scholar 

  31. Pach, J. and Agarwal, R.K., Combinatorial Geometry, New York: Wiley, 1995.

    Book  MATH  Google Scholar 

  32. Kel'manov, A.V. and Khandeev, V.., A 2-Approximation Polynomial Algorithm for a Clustering Problem, 7. Appl. Ind. Math., 2013, vol. 2013, no. 7, pp. 4–515.

    MATH  Google Scholar 

  33. Gimadi, E.Kh., Kel'manov, A.V., Kel'manova, M.A., and Khamidullin, S.A., A Posteriori Detection of a Quasiperiodic Fragment in a Numerical Sequence with a Given Number of Recurrences, Sib. Zh. Ind. Mat., 2006, vol. 2006, no. 9, pp. 1–55.

    MATH  Google Scholar 

  34. Gimadi, E.Kh., Kel'manov, A.V., Kel'manova, M.A., and Khamidullin, S.A., A Posteriori Detecting a Quasiperiodic Fragment in a Numerical Sequence, Pattern Recogn. Image An., 2008, vol. 2008, no. 18, pp. 1–30.

    MATH  Google Scholar 

  35. Baburin, A.E., Gimadi, E.Kh., Glebov, N.I., and Pyatkin, A.V., The Problem of Finding a Subset of Vectors with the Maximum Total Weight, 7. Appl. Ind. Math., 2008, vol. 2008, no. 2, pp. 1–32.

    MathSciNet  MATH  Google Scholar 

  36. Gimadi, E.Kh., Pyatkin, A.V, and Rykov, L.., On Polynomial Solvability of Some Problems of a Vector Subset Choice in a Euclidean Space of Fixed Dimension, 7. Appl. Ind. Math., 2010, vol. 2010, no. 4, pp. 1–48.

    Google Scholar 

  37. Shenmaier, V.V., Solving Some Vector Subset Problems by Voronoi Diagrams, 7. Appl. Ind. Math., 2016, vol. 2016, no. 10, pp. 4–560.

    MATH  MathSciNet  Google Scholar 

  38. Kel'manov, A.V. and Khandeev, V.., An Exact Pseudopolynomial Algorithm for a Problem of the Two-Cluster Partitioning of a Set of Vectors, 7. Appl. Ind. Math., 2015, vol. 2015, no. 9, pp. 4–497.

    MATH  Google Scholar 

  39. Dolgushev, A.V. and Kel'manov, A.V., An Approximation Algorithm for Solving a Problem of Cluster Analysis, 7. Appl. Ind. Math., 2011, vol. 2011, no. 5, pp. 4–551.

    MathSciNet  MATH  Google Scholar 

  40. Dolgushev, A.V., Kel'manov, A.V., and Shenmaier, V.V., Polynomial-Time Approximation Scheme for a Problem of Partitioning a Finite Set into Two Clusters, Proc. Steklov Inst. Math., 2016, vol. 295, suppl. 1, pp. 47–56.

    Article  MathSciNet  MATH  Google Scholar 

  41. Kel'manov, A.V. and Khandeev, V.I., Fully Polynomial-Time Approximation Scheme for a Special Case of a Quadratic Euclidean 2-Clustering Problem, J. Appl. Ind. Math., 2016, vol. 2016, no. 56, pp. 2–334.

    MATH  Google Scholar 

  42. Kel'manov, A.V., Motkova, A.V., and Shenmaier, V.V., An Approximation Scheme for a Weighted Two-Cluster Partition Problem, LNCS, 2018, vol. 10716, pp. 323–333.

    MATH  Google Scholar 

  43. Kel'manov, A.V. and Khandeev, V.I., A Randomized Algorithm for Two-Cluster Partition of a Set of Vectors, Comput. Math. Math. Phys., 2015, vol. 2015, no. 55, pp. 2–330.

    MathSciNet  MATH  Google Scholar 

  44. Kel'manov, A.V. and Motkova, A.V, Polynomial-Time Approximation Algorithm for the Problem of Cardinality-Weighted Variance-Based 2-Clustering with a Given Center, Comput. Math. Math. Phys., 2018, vol. 2018, no. 58, pp. 1–130.

    MathSciNet  MATH  Google Scholar 

  45. Kel'manov, A.V. and Motkova, A.V, Exact Pseudopolynomial Algorithms for a Balanced 2-Clustering Problem, I. Appl. Ind. Math., 2016, vol. 2016, no. 10, pp. 3–349.

    MathSciNet  MATH  Google Scholar 

  46. Kel'manov, A.V. and Motkova, A.V, A Fully Polynomial-Time Approximation Scheme for a Special Case of a Balanced 2-Clustering Problem, LNCS, 2016, vol. 9869, pp. 182–192.

    MathSciNet  MATH  Google Scholar 

  47. Kel'manov, A.V, Khandeev, V.., and Panasenko, A.V, Randomized Algorithms for Some Clustering Problems, Comm. Comput. Inform. Sci., 2018, vol. CCIS-871, pp. 109–119.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to A. V. Kel′manov, A. V. Panasenko or V. I. Khandeev.

Additional information

Russian Text © The Author(s), 2019, published in Sibirskii Zhurnal Vychislitel’noi Matematiki, 2019, Vol. 22, No. 2, pp. 121–136.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kel′manov, A.V., Panasenko, A.V. & Khandeev, V.I. Exact Algorithms of Search for a Cluster of the Largest Size in Two Integer 2-Clustering Problems. Numer. Analys. Appl. 12, 105–115 (2019). https://doi.org/10.1134/S1995423919020010

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1995423919020010

Keywords

Navigation