当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
BLOCK-DBSCAN: Fast Clustering For Large Scale Data
Pattern Recognition ( IF 7.5 ) Pub Date : 2021-01-01 , DOI: 10.1016/j.patcog.2020.107624
Yewang Chen , Lida Zhou , Nizar Bouguila , Cheng Wang , Yi Chen , Jixiang Du

Abstract We analyze the drawbacks of DBSCAN and its variants, and find the grid technique, which is used in Fast-DBSCAN and ρ-approximate DBSCAN, is almost useless in high dimensional data space. Because it usually yields considerable redundant distance computations. In order to tame these problems, two techniques are proposed: one is to use ϵ 2 -norm ball to identify Inner Core Blocks within which all points are core points, it has higher efficiency than grid technique for finding more core points at one time; the other is a fast approximate algorithm for judging whether two Inner Core Blocks are density-reachable from each other. Besides, cover tree is also used to accelerate the process of density computations. Based on the three techniques, an approximate approach, namely BLOCK-DBSCAN, is proposed for large scale data, which runs in about O(nlog (n)) expected time and obtains almost the same result as DBSCAN. BLOCK-DBSCAN has two versions, i.e., L2 version can work well for relatively high dimensional data, and L∞ version is suitable for high dimensional data. Experimental results show that BLOCK-DBSCAN is promising and outperforms NQDBSCAN, ρ-approximate DBSCAN and AnyDBC.

中文翻译:

BLOCK-DBSCAN:大规模数据的快速聚类

摘要 我们分析了DBSCAN 及其变体的缺点,发现Fast-DBSCAN 和ρ-approximate DBSCAN 中使用的网格技术在高维数据空间中几乎无用。因为它通常会产生相当多的冗余距离计算。为了驯服这些问题,提出了两种技术:一种是使用 ϵ 2 -norm ball 来识别所有点都是核心点的 Inner Core Blocks,它比网格技术一次找到更多核心点的效率更高;另一种是快速近似算法,用于判断两个 Inner Core Blocks 之间是否密度可达。此外,覆盖树还用于加速密度计算过程。基于这三种技术,针对大规模数据提出了一种近似方法,即BLOCK-DBSCAN,它在大约 O(nlog (n)) 预期时间内运行并获得与 DBSCAN 几乎相同的结果。BLOCK-DBSCAN 有两个版本,即L2 版本适用于相对高维数据,L∞ 版本适用于高维数据。实验结果表明,BLOCK-DBSCAN 是有前途的,并且优于 NQDBSCAN、ρ-approximate DBSCAN 和 AnyDBC。
更新日期:2021-01-01
down
wechat
bug