当前位置: X-MOL 学术arXiv.cs.DS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Faster Projective Clustering Approximation of Big Data
arXiv - CS - Data Structures and Algorithms Pub Date : 2020-11-26 , DOI: arxiv-2011.13476
Adiel Statman, Liat Rozenberg, Dan Feldman

In projective clustering we are given a set of n points in $R^d$ and wish to cluster them to a set $S$ of $k$ linear subspaces in $R^d$ according to some given distance function. An $\eps$-coreset for this problem is a weighted (scaled) subset of the input points such that for every such possible $S$ the sum of these distances is approximated up to a factor of $(1+\eps)$. We suggest to reduce the size of existing coresets by suggesting the first $O(\log(m))$ approximation for the case of $m$ lines clustering in $O(ndm)$ time, compared to the existing $\exp(m)$ solution. We then project the points on these lines and prove that for a sufficiently large $m$ we obtain a coreset for projective clustering. Our algorithm also generalize to handle outliers. Experimental results and open code are also provided.

中文翻译:

大数据的更快的投影聚类近似

在投影聚类中,我们给定$ R ^ d $中的n个点的集合,并希望根据给定的距离函数将它们聚类为$ R ^ d $中的$ k $线性子空间的集合$ S $。这个问题的$ \ eps $核心集是输入点的加权(定标)子集,这样对于每个这样的$ S $,这些距离的总和近似为$(1+ \ eps)$的因数。 。我们建议通过建议在$ O(ndm)$时间内聚集$ m $行的情况下的第一个$ O(\ log(m))$近似值来减少现有核心集的大小,而与现有的$ \ exp( m)$解决方案。然后,我们在这些线上投影点,并证明对于足够大的$ m $,我们可以获得投影聚类的核心集。我们的算法也可以泛化处理异常值。还提供了实验结果和开放代码。
更新日期:2020-12-01
down
wechat
bug