Faster Projective Clustering Approximation of Big Data,arXiv - CS - Data Structures and Algorithms

当前位置： X-MOL 学术 › arXiv.cs.DS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Faster Projective Clustering Approximation of Big Data
arXiv - CS - Data Structures and Algorithms Pub Date : 2020-11-26 , DOI: arxiv-2011.13476
Adiel Statman, Liat Rozenberg, Dan Feldman

In projective clustering we are given a set of n points in $R^d$ and wish to cluster them to a set $S$ of $k$ linear subspaces in $R^d$ according to some given distance function. An $\eps$-coreset for this problem is a weighted (scaled) subset of the input points such that for every such possible $S$ the sum of these distances is approximated up to a factor of $(1+\eps)$. We suggest to reduce the size of existing coresets by suggesting the first $O(\log(m))$ approximation for the case of $m$ lines clustering in $O(ndm)$ time, compared to the existing $\exp(m)$ solution. We then project the points on these lines and prove that for a sufficiently large $m$ we obtain a coreset for projective clustering. Our algorithm also generalize to handle outliers. Experimental results and open code are also provided.

中文翻译：

大数据的更快的投影聚类近似

在投影聚类中，我们给定$ R ^ d $中的n个点的集合，并希望根据给定的距离函数将它们聚类为$ R ^ d $中的$ k $线性子空间的集合$ S $。这个问题的$ \ eps $核心集是输入点的加权（定标）子集，这样对于每个这样的$ S $，这些距离的总和近似为$（1+ \ eps）$的因数。。我们建议通过建议在$ O（ndm）$时间内聚集$ m $行的情况下的第一个$ O（\ log（m））$近似值来减少现有核心集的大小，而与现有的$ \ exp（ m）$解决方案。然后，我们在这些线上投影点，并证明对于足够大的$ m $，我们可以获得投影聚类的核心集。我们的算法也可以泛化处理异常值。还提供了实验结果和开放代码。

更新日期：2020-12-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文