Incremental Density-Based Clustering on Multicore Processors,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Incremental Density-Based Clustering on Multicore Processors
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 9-10-2020 , DOI: 10.1109/tpami.2020.3023125
Son T. Mai ₁ , Jon Jacobsen ₂ , Sihem Amer-Yahia ₃ , Ivor Spence ₁ , Nhat-Phuong Tran ₁ , Ira Assent ₂ , Quoc Viet Hung Nguyen ₄

Affiliation

The density-based clustering algorithm is a fundamental data clustering technique with many real-world applications. However, when the database is frequently changed, how to effectively update clustering results rather than reclustering from scratch remains a challenging task. In this work, we introduce IncAnyDBC, a unique parallel incremental data clustering approach to deal with this problem. First, IncAnyDBC can process changes in bulks rather than batches like state-of-the-art methods for reducing update overheads. Second, it keeps an underlying cluster structure called the object node graph during the clustering process and uses it as a basis for incrementally updating clusters wrt. inserted or deleted objects in the database by propagating changes around affected nodes only. In additional, IncAnyDBC actively and iteratively examines the graph and chooses only a small set of most meaningful objects to produce exact clustering results of DBSCAN or to approximate results under arbitrary time constraints. This makes it more efficient than other existing methods. Third, by processing objects in blocks, IncAnyDBC can be efficiently parallelized on multicore CPUs, thus creating a work-efficient method. It runs much faster than existing techniques using one thread while still scaling well with multiple threads. Experiments are conducted on various large real datasets for demonstrating the performance of IncAnyDBC.

中文翻译：

多核处理器上基于增量密度的集群

基于密度的聚类算法是一种基本的数据聚类技术，具有许多实际应用。然而，当数据库频繁更改时，如何有效地更新聚类结果而不是从头开始重新聚类仍然是一个具有挑战性的任务。在这项工作中，我们引入了 IncAnyDBC，一种独特的并行增量数据集群方法来处理这个问题。首先，IncAnyDBC 可以批量处理更改，而不是像最先进的方法那样批量处理，以减少更新开销。其次，它在聚类过程中保留称为对象节点图的底层集群结构，并将其用作增量更新集群的基础。通过仅在受影响的节点周围传播更改来插入或删除数据库中的对象。此外，IncAnyDBC 主动、迭代地检查图形，并仅选择一小组最有意义的对象来生成 DBSCAN 的精确聚类结果或在任意时间限制下得出近似结果。这使得它比其他现有方法更有效。第三，通过以块的方式处理对象，IncAnyDBC可以在多核CPU上有效地并行化，从而创建一种高效的工作方法。它的运行速度比使用单线程的现有技术快得多，同时仍然可以很好地扩展多线程。在各种大型真实数据集上进行了实验，以展示 IncAnyDBC 的性能。

更新日期：2024-08-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11