当前位置:
X-MOL 学术
›
Probab. Eng. Inf. Sci.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
A PROBABILISTIC ℓ1 METHOD FOR CLUSTERING HIGH-DIMENSIONAL DATA
Probability in the Engineering and Informational Sciences ( IF 0.7 ) Pub Date : 2021-04-05 , DOI: 10.1017/s0269964820000479 Tsvetan Asamov 1 , Adi Ben-Israel 1
Probability in the Engineering and Informational Sciences ( IF 0.7 ) Pub Date : 2021-04-05 , DOI: 10.1017/s0269964820000479 Tsvetan Asamov 1 , Adi Ben-Israel 1
Affiliation
In general, the clustering problem is NP-hard, and global optimality cannot be established for non-trivial instances. For high-dimensional data, distance-based methods for clustering or classification face an additional difficulty, the unreliability of distances in very high-dimensional spaces. We propose a probabilistic, distance-based, iterative method for clustering data in very high-dimensional space, using the ℓ1 -metric that is less sensitive to high dimensionality than the Euclidean distance. For K clusters in ℝ n , the problem decomposes to K problems coupled by probabilities, and an iteration reduces to finding Kn weighted medians of points on a line. The complexity of the algorithm is linear in the dimension of the data space, and its performance was observed to improve significantly as the dimension increases.
中文翻译:
高维数据聚类的概率ℓ1方法
一般来说,聚类问题是 NP 难的,不能为非平凡实例建立全局最优性。对于高维数据,基于距离的聚类或分类方法面临一个额外的困难,即非常高维空间中距离的不可靠性。我们提出了一种概率的、基于距离的迭代方法,用于在非常高维空间中对数据进行聚类,使用 ℓ1 - 对高维的敏感性低于欧几里得距离的度量。为了ķ ℝ 中的簇 n , 问题分解为ķ 由概率耦合的问题,迭代简化为寻找Kn 一条线上点的加权中位数。该算法的复杂度在数据空间的维度上是线性的,并且观察到其性能随着维度的增加而显着提高。
更新日期:2021-04-05
中文翻译:
高维数据聚类的概率ℓ1方法
一般来说,聚类问题是 NP 难的,不能为非平凡实例建立全局最优性。对于高维数据,基于距离的聚类或分类方法面临一个额外的困难,即非常高维空间中距离的不可靠性。我们提出了一种概率的、基于距离的迭代方法,用于在非常高维空间中对数据进行聚类,使用 ℓ