On Approximability of Clustering Problems Without Candidate Centers,arXiv - CS - Computational Complexity

当前位置： X-MOL 学术 › arXiv.cs.CC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On Approximability of Clustering Problems Without Candidate Centers
arXiv - CS - Computational Complexity Pub Date : 2020-09-30 , DOI: arxiv-2010.00087
Vincent Cohen-Addad, Karthik C. S., and Euiwoong Lee

The k-means objective is arguably the most widely-used cost function for modeling clustering tasks in a metric space. In practice and historically, k-means is thought of in a continuous setting, namely where the centers can be located anywhere in the metric space. For example, the popular Lloyd's heuristic locates a center at the mean of each cluster. Despite persistent efforts on understanding the approximability of k-means, and other classic clustering problems such as k-median and k-minsum, our knowledge of the hardness of approximation factors of these problems remains quite poor. In this paper, we significantly improve upon the hardness of approximation factors known in the literature for these objectives. We show that if the input lies in a general metric space, it is NP-hard to approximate: $\bullet$ Continuous k-median to a factor of $2-o(1)$; this improves upon the previous inapproximability factor of 1.36 shown by Guha and Khuller (J. Algorithms '99). $\bullet$ Continuous k-means to a factor of $4- o(1)$; this improves upon the previous inapproximability factor of 2.10 shown by Guha and Khuller (J. Algorithms '99). $\bullet$ k-minsum to a factor of $1.415$; this improves upon the APX-hardness shown by Guruswami and Indyk (SODA '03). Our results shed new and perhaps counter-intuitive light on the differences between clustering problems in the continuous setting versus the discrete setting (where the candidate centers are given as part of the input).

中文翻译：

无候选中心聚类问题的逼近性

k-means 目标可以说是在度量空间中对聚类任务进行建模的最广泛使用的成本函数。在实践和历史上，k-means 被认为是在连续设置中，即中心可以位于度量空间中的任何位置。例如，流行的劳埃德启发式算法在每个聚类的均值处定位一个中心。尽管在理解 k 均值和其他经典聚类问题（如 k 中值和 k 最小求和）的逼近性方面做出了不懈努力，但我们对这些问题的逼近因子的难度的了解仍然相当贫乏。在本文中，我们显着提高了这些目标的文献中已知的近似因子的硬度。我们表明，如果输入位于一般度量空间中，则近似是 NP 难的：$\bullet$ 连续 k 中值到 $2-o(1)$ 的因子；这改进了之前由 Guha 和 Khuller (J. Algorithms '99) 显示的 1.36 的不可逼近因子。$\bullet$ 连续 k 均值到 $4-o(1)$ 的因子；这改进了 Guha 和 Khuller (J. Algorithms '99) 所示的先前不可逼近因子 2.10。$\bullet$ k-minsum 到 $1.415$ 的因子；这改进了 Guruswami 和 Indyk (SODA '03) 显示的 APX 硬度。我们的结果为连续设置与离散设置（候选中心作为输入的一部分）中的聚类问题之间的差异提供了新的且可能违反直觉的说明。$\bullet$ 连续 k 均值到 $4-o(1)$ 的因子；这改进了 Guha 和 Khuller (J. Algorithms '99) 所示的先前不可逼近因子 2.10。$\bullet$ k-minsum 到 $1.415$ 的因子；这改进了 Guruswami 和 Indyk (SODA '03) 显示的 APX 硬度。我们的结果为连续设置与离散设置（候选中心作为输入的一部分）中的聚类问题之间的差异提供了新的且可能违反直觉的说明。$\bullet$ 连续 k 均值到 $4-o(1)$ 的因子；这改进了 Guha 和 Khuller (J. Algorithms '99) 所示的先前不可逼近因子 2.10。$\bullet$ k-minsum 到 $1.415$ 的因子；这改进了 Guruswami 和 Indyk (SODA '03) 显示的 APX 硬度。我们的结果为连续设置与离散设置（候选中心作为输入的一部分）中的聚类问题之间的差异提供了新的且可能违反直觉的说明。

更新日期：2020-10-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文