当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Incremental Density-Based Clustering on Multicore Processors
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2020-09-10 , DOI: 10.1109/tpami.2020.3023125
Son T. Mai 1 , Jon Jacobsen 2 , Sihem Amer-Yahia 3 , Ivor Spence 1 , Nhat-Phuong Tran 1 , Ira Assent 2 , Quoc Viet Hung Nguyen 4
Affiliation  

The density-based clustering algorithm is a fundamental data clustering technique with many real-world applications. However, when the database is frequently changed, how to effectively update clustering results rather than reclustering from scratch remains a challenging task. In this work, we introduce IncAnyDBC, a unique parallel incremental data clustering approach to deal with this problem. First, IncAnyDBC can process changes in bulks rather than batches like state-of-the-art methods for reducing update overheads. Second, it keeps an underlying cluster structure called the object node graph during the clustering process and uses it as a basis for incrementally updating clusters wrt. inserted or deleted objects in the database by propagating changes around affected nodes only. In additional, IncAnyDBC actively and iteratively examines the graph and chooses only a small set of most meaningful objects to produce exact clustering results of DBSCAN or to approximate results under arbitrary time constraints. This makes it more efficient than other existing methods. Third, by processing objects in blocks , IncAnyDBC can be efficiently parallelized on multicore CPUs, thus creating a work-efficient method. It runs much faster than existing techniques using one thread while still scaling well with multiple threads. Experiments are conducted on various large real datasets for demonstrating the performance of IncAnyDBC.

中文翻译:

多核处理器上基于增量密度的集群

基于密度的聚类算法是具有许多实际应用的基本数据聚类技术。然而,当数据库频繁更改时,如何有效地更新聚类结果而不是从头开始重新聚类仍然是一项具有挑战性的任务。在这项工作中,我们介绍了 IncAnyDBC,这是一种独特的并行增量数据聚类方法来处理这个问题。首先,IncAnyDBC 可以处理散装而不是 批次,例如用于减少更新开销的最先进方法。其次,它在聚类过程中保留了一个称为对象节点图的底层集群结构,并将其用作增量更新集群的基础。通过仅在受影响节点周围传播更改来插入或删除数据库中的对象。此外,IncAnyDBC积极和迭代地检查图并仅选择一小组最有意义的对象来产生 DBSCAN 的精确聚类结果或在任意时间约束下近似结果。这使得它比其他现有方法更有效。三、通过处理对象块,IncAnyDBC 可以在多核 CPU 上有效地并行化,从而创建一个工作效率高的方法。它比使用一个线程的现有技术运行得快得多,同时仍然可以很好地扩展多个线程。实验在各种大型真实数据集上进行,以展示 IncAnyDBC 的性能。
更新日期:2020-09-10
down
wechat
bug