当前位置: X-MOL 学术Numer. Analys. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exact Algorithms of Search for a Cluster of the Largest Size in Two Integer 2-Clustering Problems
Numerical Analysis and Applications ( IF 0.4 ) Pub Date : 2019-06-06 , DOI: 10.1134/s1995423919020010
A. V. Kel′manov , A. V. Panasenko , V. I. Khandeev

We consider two related discrete optimization problems of searching for a subset in a finite set of points in Euclidean space. Both problems are induced by versions of a fundamental problem in data analysis, namely, that of selecting a subset of similar elements in a set of objects. In each problem, given an input set and a positive real number, it is required to find a cluster (i.e., a subset) of the largest size under constraints on a quadratic clusterization function. The points in the input set, which are outside the sought-for subset, are treated as a second (complementary) cluster. In the first problem, the function under the constraint is the sum over both clusters of the intracluster sums of the squared distances between the elements of the clusters and their centers. The center of the first (i.e., the sought-for) cluster is unknown and determined as a centroid, while the center of the second one is fixed at a given point in Euclidean space (without loss of generality, at the origin of coordinates). In the second problem, the function under the constraint is the sum over both clusters of the weighted intracluster sums of the squared distances between the elements of the clusters and their centers. As in the first problem, the center of the first cluster is unknown and determined as a centroid, while the center of the second one is fixed at the origin of coordinates. In this paper, we show that both problems are strongly NP-hard. Also, we present exact algorithms for the problems in which the input points have integer components. If the space dimension is bounded by some constant, the algorithms are pseudopolynomial.

中文翻译:

在两个整数2聚类问题中搜索最大尺寸簇的精确算法

我们考虑了两个相关的离散优化问题,即在欧几里得空间中的一组有限点中搜索子集。这两个问题都是由数据分析中一个基本问题的版本引起的,即在一组对象中选择相似元素的子集。在每个问题中,给定一个输入集和一个正实数,需要在二次聚类函数的约束下找到最大尺寸的聚类(即子集)。输入集中位于搜索子集之外的点被视为第二(互补)类。在第一个问题中,约束条件下的函数是群集内两个群集上各群集元素及其中心之间平方距离的平方和之和。第一个的中心(即 寻找的聚类是未知的,并确定为质心,而第二个聚类的中心固定在欧几里得空间中的给定点(在坐标原点处不失一般性)。在第二个问题中,约束条件下的函数是两个群集内群集元素之间及其中心距离的平方距离的加权群集内和之和。与第一个问题一样,第一个聚类的中心未知,并确定为质心,而第二个聚类的中心固定在坐标原点。在本文中,我们证明这两个问题都具有很强的NP难度。此外,我们为输入点具有整数分量的问题提供了精确的算法。如果空间维数受某个常数限制,则该算法是伪多项式。而第二个坐标的中心固定在欧几里得空间中的给定点上(不失一般性,在坐标原点处)。在第二个问题中,约束条件下的函数是两个群集内群集元素与其中心之间距离的平方距离的加权群集内和之和。与第一个问题一样,第一个聚类的中心未知,并确定为质心,而第二个聚类的中心固定在坐标原点。在本文中,我们证明这两个问题都具有很强的NP难度。此外,我们为输入点具有整数分量的问题提供了精确的算法。如果空间维数受某个常数限制,则该算法是伪多项式。而第二个坐标的中心固定在欧几里得空间中的给定点上(不失一般性,在坐标原点处)。在第二个问题中,约束条件下的函数是两个群集内群集元素与其中心之间距离的平方距离的加权群集内和之和。与第一个问题一样,第一个聚类的中心未知,并确定为质心,而第二个聚类的中心固定在坐标原点。在本文中,我们证明这两个问题都具有很强的NP难度。此外,我们为输入点具有整数分量的问题提供了精确的算法。如果空间维数受某个常数限制,则该算法是伪多项式。
更新日期:2019-06-06
down
wechat
bug