A PROBABILISTIC ℓ1 METHOD FOR CLUSTERING HIGH-DIMENSIONAL DATA,Probability in the Engineering and Informational Sciences

当前位置： X-MOL 学术 › Probab. Eng. Inf. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A PROBABILISTIC ℓ1 METHOD FOR CLUSTERING HIGH-DIMENSIONAL DATA
Probability in the Engineering and Informational Sciences ( IF 0.7 ) Pub Date : 2021-04-05 , DOI: 10.1017/s0269964820000479
Tsvetan Asamov ₁ , Adi Ben-Israel ₁

Affiliation

In general, the clustering problem is NP-hard, and global optimality cannot be established for non-trivial instances. For high-dimensional data, distance-based methods for clustering or classification face an additional difficulty, the unreliability of distances in very high-dimensional spaces. We propose a probabilistic, distance-based, iterative method for clustering data in very high-dimensional space, using the ℓ1-metric that is less sensitive to high dimensionality than the Euclidean distance. For K clusters in ℝ n , the problem decomposes to K problems coupled by probabilities, and an iteration reduces to finding Kn weighted medians of points on a line. The complexity of the algorithm is linear in the dimension of the data space, and its performance was observed to improve significantly as the dimension increases.

中文翻译：

高维数据聚类的概率ℓ1方法

一般来说，聚类问题是 NP 难的，不能为非平凡实例建立全局最优性。对于高维数据，基于距离的聚类或分类方法面临一个额外的困难，即非常高维空间中距离的不可靠性。我们提出了一种概率的、基于距离的迭代方法，用于在非常高维空间中对数据进行聚类，使用 ℓ1- 对高维的敏感性低于欧几里得距离的度量。为了ķℝ 中的簇 n , 问题分解为ķ由概率耦合的问题，迭代简化为寻找Kn一条线上点的加权中位数。该算法的复杂度在数据空间的维度上是线性的，并且观察到其性能随着维度的增加而显着提高。

更新日期：2021-04-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11