Improving Density Peaks Clustering through GPU acceleration

https://doi.org/10.1016/j.future.2022.11.033Get rights and content

Highlights

  • A vectorized VP-Tree layout and parallel construction method for VP-Tree is designed.

  • A new method for parallel Density Peaks Clustering computation with GPU acceleration is proposed.

  • A GPU-friendly VP-Tree dynamic update scheme is designed to support incremental clustering.

  • A GPU-friendly incremental DPC method based on the dynamic VP-Tree is designed.

Abstract

Density Peaks Clustering (DPC) is a recently proposed clustering algorithm that has distinct advantages over existing clustering algorithms, which has already been used in a wide range of applications. However, DPC requires computing the distance between every pair of input points, therefore incurring quadratic computation overhead, which is prohibitive for large data sets. To address this efficiency problem, we propose to use GPU to accelerate DPC. We exploit a spatial index structure VP-Tree to efficiently maintain the data points and propose a GPU-friendly parallel VP-Tree construction algorithm. Based on the constructed VP-Tree, we propose a GPU-Accelerated DPC algorithm GDPC, in which the all-pair computation in DPC is greatly accelerated. Furthermore, in order to process dynamic evolving datasets, we propose an incremental GDPC algorithm, Incremental GDPC. Our results show that GDPC can achieve over 5.3-148.9X acceleration compared to the state-of-the-art GPU-based, multicore-based, and distributed DPC implementations, a 2.3-40.5X acceleration compared to the state-of-the-art incremental DPC algorithm.

Introduction

Data clustering is one of the most fundamental problems in many real-world applications, such as recommender systems, social networks, image processing, and bioinformatics. Basically, it groups a set of objects based on the similarities of objects such that objects in the same group (i.e., cluster) are more similar to each other than to those in other groups. Many different clustering algorithms have been proposed in the literature such as Kmeans [1] and DBSCAN [2]. There are also many research efforts to improve the efficiency of clustering to handle massive data [3], [4], [5].

Density Peaks Clustering (DPC) [6] is a novel clustering algorithm proposed recently. Given a set of points, DPC computes two metrics for every point p: (i) the local density ρ, which is the number of points within a specified distance from p; and (ii) the dependent distance δ, which is the minimum distance from p to other points with higher densities. It is observed that the center of a cluster sees the highest local density among its neighboring points (i.e., density center), and has a relatively large distance from other points with higher densities (i.e., far away from other density centers). Thus, cluster centers can be identified as points with both large ρ and large δ. With these identified cluster centers and the density center dependency trees extracted during the dependent distance δ’s computation, the point-to-center relationship, or in other words, the point-to-cluster assignment (clustering results), can be discovered.

Compared with previous clustering algorithms, DPC has many advantages. (1) Unlike Kmeans, DPC does not require a pre-specified number of clusters. (2) DPC does not assume the clusters to be “balls” in space and supports arbitrarily shaped clusters. (3) DPC is more deterministic, since the clustering results have been shown to be robust against the initial choice of algorithm parameters. (4) The extraction of (ρ, δ) provides a two-dimensional representation of the input data, which can be in very high dimensions, so that it is easier for users to gain new insights from the two-dimensional representation plot of the data. Due to its effectiveness and novelty, DPC has already been employed in a wide range of applications, such as neuroscience [7], geoscience [8], and computer vision [9].

While DPC is attractive for its effectiveness and its simplicity, the application of DPC is limited by its computational cost. In order to obtain the density values ρ, DPC computes the distance between every pair of points. That is, given N points in the input data set, its computational cost is O(N2). Moreover, in order to obtain the dependent distance values δ, a global sort operation on all points based on their density values (with computational cost O(Nlog(N))) and N(N1)2 compare operations are required. As a result, it can be very time consuming to perform DPC on large data sets.

In the past few years, several research efforts have been put on accelerating DPC. LSH-DDP [3], EDDPC [10] and FDDP [11] leverage distributed approaches to help DPC handle large scale datasets. EDMStream [12] improves DPC by efficiently maintaining a novel in-memory dependent-tree structure. Ex-DPC and S-Approx-DPC [13] accelerate DPC efficiency by leveraging multi-core processing.

The recent advance in GPU technology is offering great prospects in parallel computation [14], [15]. With up to 80 GB GPU memory size [16], it is possible to use GPU to process large-scale data. There exist several related work have been devoted to accelerate DPC using GPU’s parallelization ability. Li et al. [17] propose a thread/block model and shared memory designs to accelerate the distance matrix computation. CUDA-DP [18] also integrates GPU’s parallelization ability and improves data locality to increase performance. However, these methods only focus on employing GPU’s many-core features to accelerate DPC, without paying attention to utilizing spatial index structures that can greatly filter out a large number of unnecessary all-pair computations.

In this paper, we exploit a spatial index structure vantage point tree (VP-Tree) [19] to help efficiently maintain clustering data. With VP-Tree, data points are partitioned into “hypershells” with decreasing radius. Comparing with other spatial index structures (such as KD-Tree [20] and Ball-Tree [21]), VP-Tree is more appropriate in DPC algorithm, because the decreasing-radius hypershell structure in VP-Tree can well support the point density computation (that obtains nearby points within a predefined radius of a point) and the dependent distance computation (that obtains the distance to the nearest neighbor with higher density). Besides, VP-Tree is more suitable for clustering high-dimensional data [22]. More importantly, the construction of VP-Tree and the search of VP-Tree can be well parallelized to adapt to GPU’s many-core architecture. Based on the GPU-based VP-Tree, we propose GDPC algorithm, where the density ρ and the dependent distance δ can be efficiently calculated by querying the index structure and lots of unnecessary distance measurements (between faraway points) can be avoided.

On the other hand, new data are produced every day and the data distribution may evolve overtime. As a result, the clustering results should continuously evolve correspondingly. The dynamically evolving feature of data drives us to seek an incremental clustering approach. Rather than re-performing DPC on the whole updated data sets, the incremental clustering algorithm leverages previous clustering results and only updates the affected point-to-cluster assignments. Considering the massive size of data sets, the incremental processing approach is desirable especially in production usage. Therefore, we further propose Incremental GDPC to support incremental clustering, which extends GDPC in the following aspects. (1) We design a GPU-friendly dynamic VP-Tree index update scheme that reduces the number of tree traversals and eliminates the write-write conflicts in GPU’s many core computations. (2) Based on this dynamic VP-Tree index, we propose Incremental GDPC which can efficiently update the clustering results.

To sum up, we list our contributions in the following.

  • GPU-Accelerated VP-Tree Construction. We design a vectorized VP-Tree layout to adapt to GPU architecture and take full advantage of GPU parallelism to speed up the VP-Tree index construction.

  • GPU-Accelerated DPC Implementation. We propose to use the VP-Tree index to improve the efficiency of all-pair computation and rely on this index to avoid unnecessary computations in DPC’s density evaluation and dependent distance evaluation with GPU’s parallel computation support.

  • GPU-Accelerated Incremental DPC Update. We provide incremental clustering support for dynamically evolving data by designing the GPU-friendly incremental update methods of density and dependent distance based on the dynamic VP-Tree.

We perform experiments on various real-world datasets and compare with a state-of-the-art GPU-based DPC algorithm CUDA-DP [18], a multicore-based parallel DPC algorithm S-Approx-DPC [13], and a distributed DPC implementation LSH-DDP [3]. Our results show that our GDPC can achieve 5.3–17.8X speedup over CUDA-DP, 43–148.9X speedup over S-Approx-DPC and 44.8–78.8X speedup over LSH-DDP. We further perform experiments on evolving datasets and compare with the state-of-the-art incremental DPC algorithm EDMStream [12]. Our results show that our Incremental GDPC can achieve 2.3–40.5X speedup over EDMStream.

The remainder of the paper is organized as follows. Section 2 describes the background on DPC and GPU’s architecture. Section 3 presents the GPU-accelerated VP-Tree construction and query methods. Section 4 proposes our GPU-accelerated DPC algorithm GDPC. Section 5 introduces how we maintain the dynamic VP-Tree on GPU and proposes Incremental GDPC. Section 6 reports the experimental results. Section 7 discusses related work and Section 8 concludes the paper.

Section snippets

Background and preliminaries

In this section, we first review the standard Density Peaks Clustering (DPC) algorithm. We then introduce the background of GPU architecture and memory hierarchy.

GPU-accelerated VP-Tree construction and query

VP-Tree is the key component in our proposed GPU- accelerated DPC algorithm. In this section, we first discuss why we prefer VP-Tree over other spatial index structures. We then describe the VP-Tree construction and query methods in detail.

GDPC based on VP-Tree

In this section, we describe our proposed GPU-based DPC algorithm, GDPC that utilizes the constructed VP-Tree to accelerate DPC. The original DPC algorithm contains three steps, computing density values ρ, computing dependent distances δ, and assigning points to clusters. All three steps of GDPC are performed on GPU. In the following, we will describe these steps respectively.

Incremental GDPC

In this section, we present how to handle incremental updates of GDPC clustering results on evolving datasets. Suppose the original dataset is D and the newly added data is ΔD. In order to update clustering results in response to input changes, we should first update the VP-Tree efficiently with GPU parallelism. In Section 5.1, we propose an incremental update method that can adjust the GPU-based VP-Tree incrementally with newly added data ΔD. Furthermore, the addition of new points always

Experiments

This section evaluates GDPC and its incremental variant on real-world datasets to verify their benefits.

Machine Configuration. We conduct all experiments on two 8-core servers (Intel Xeon CPU Silver 4110 @ 2.1 GHz, 32 logical CPUs and 64GB host memory) with an NVIDIA RTX2080Ti GPU. It has 68 SMs and 11 GB GDDR6 memory with a peak bandwidth of 616 GB/s. Our implementation is compiled by CUDA 10 along with nvcc optimization flag -O3.

Dataset. Table 1 lists the data sets used in our experiments.

Related work

Parallel clustering makes full use of the resources of multiple processors, makes the clustering algorithm run on multiple processors at the same time, greatly shortens the execution time of the clustering algorithm, and provides an effective solution for large-scale data clustering analysis.

In recent years, general-purpose graphics processing units (GPGPU) have been widely used to facilitate processing-intensive operations for their parallel processing ability. There exist a number of studies

Conclusion

In this paper, we propose a parallel density peaks algorithm named GDPC, which can fully utilize the powerful computation resources of GPU. It leverages a GPU-friendly spatial index VP-Tree to reduce unnecessary distance calculations. The VP-Tree construction process and the DP clustering process are greatly improved by utilizing GPU’s parallel optimizations. Our results show that GDPC can achieve 5.3–148.9X speedup over the state-of-the-art DPC implementations and our Incremental GDPC can

CRediT authorship contribution statement

Zhuojin Liu: Conceptualization, Methodology, Software, Writing – original draft, Writing – review & editing. Shufeng Gong: Conceptualization, Methodology, Writing – original draft, Writing – review & editing. Yuxuan Su: Methodology, Software, Writing – original draft. Changyi Wan: Methodology, Software, Writing – original draft. Yanfeng Zhang: Conceptualization, Methodology, Supervision, Writing – review & editing. Ge Yu: Project administration, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The work is supported by the National Natural Science Foundation of China (62072082, U2241212, U1811261, 62202088), the Key R&D Program of Liaoning Province (2020JH2/10100037), and the Fundamental Research Funds for the Central Universities (N2216015, N2216012).

Zhuojin Liu is currently working towards a graduate degree in computer science at Northeastern University. Her research interests include parallel and high-performance computing. She works on designing efficient algorithms and data structures for processing large-scale data sets.

References (43)

  • AndradeG. et al.

    G-DBSCAN: A GPU accelerated algorithm for density-based clustering

    Procedia Comput. Sci.

    (2013)
  • LloydS.

    Least squares quantization in PCM

    IEEE Trans. Inform. Theory

    (1982)
  • M. Ester, H.P. Kriegel, J. Sander, X. Xu, et al., A density-based algorithm for discovering clusters in large spatial...
  • ZhangY. et al.

    Efficient distributed density peaks for clustering large data sets in MapReduce

    IEEE Trans. Knowl. Data Eng.

    (2016)
  • Y. Wang, Y. Gu, J. Shun, Theoretically-efficient and practical parallel DBSCAN, in: 2020 ACM SIGMOD International...
  • J. Gan, Y. Tao, DBSCAN revisited: Mis-claim, un-fixability, and approximation, in: 2015 ACM SIGMOD International...
  • RodriguezA. et al.

    Clustering by fast search and find of density peaks

    Science

    (2014)
  • KobakD. et al.

    Demixed principal component analysis of population activity in higher cortical areas reveals independent representation of task parameters

    (2014)
  • SunK. et al.

    Exemplar component analysis: A fast band selection method for hyperspectral imagery

    Geosci. Remote Sens. Lett.

    (2015)
  • DeanK.M. et al.

    High-speed multiparameter photophysical analyses of fluorophore libraries

    Anal. Chem.

    (2015)
  • GongS. et al.

    EDDPC: An efficient distributed density peaks clustering algorithm

    J. Comput. Res. Dev.

    (2016)
  • LuJ. et al.

    Distributed density peaks clustering revisited

    IEEE Trans. Knowl. Data Eng.

    (2022)
  • GongS. et al.

    Clustering stream data by exploring the evolution of density mountain

    VLDB Endow.

    (2017)
  • D. Amagata, T. Hara, Fast Density-Peaks Clustering: Multicore-based Parallelization Approach, in: 2021 ACM SIGMOD...
  • WangQ. et al.

    HyTGraph: GPU-Accelerated Graph Processing with Hybrid Transfer Management

    CoRR

    (2022)
  • Q. Wang, Y. Zhang, H. Wang, C. Chen, X. Zhang, G. Yu, NeutronStar: Distributed GNN Training with Hybrid Dependency...
  • Nvidia A100 Tensor Core GPU, URL...
  • M. Li, J. Huang, J. Wang, Paralleled Fast Search and Find of Density Peaks clustering algorithm on GPUs with CUDA, in:...
  • GeK. et al.

    Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit

    Front. Inform. Technol. Electron. Eng.

    (2017)
  • P.N. Yianilos, Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces, in: ACM-SIAM...
  • BentleyJ.L.

    Multidimensional binary search trees used for associative searching

    Commun. ACM

    (1975)
  • Cited by (4)

    Zhuojin Liu is currently working towards a graduate degree in computer science at Northeastern University. Her research interests include parallel and high-performance computing. She works on designing efficient algorithms and data structures for processing large-scale data sets.

    Shufeng Gong received the Ph.D. degree in computer science from Northeastern University, China, in 2021. He is currently a lecturer with Northeastern University, Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, China. His research interests include cloud computing, distributed graph processing, and data mining.

    Yuxuan Su received the master degree in computer technology from Northeastern University, China, in 2021. Her research interests include database indexing and high-performance computing.

    Changyi Wan received the master degree in computer technology from Northeastern University, China, in 2021. His main area of research includes high-performance computing and machine learning.

    Yanfeng Zhang received the Ph.D. degree in computer science from Northeastern University, China, in 2012. He is currently a professor with Northeastern University, Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, China. His research consists of distributed systems and big data processing. He has published many papers in the above areas. His paper in SoCC 2011 was honored with ‘Paper of Distinction’.

    Ge Yu (Senior Member, IEEE) received the Ph.D. degree in computer science from Kyushu University, Japan, in 1996. He is currently a professor with Northeastern University, China. His current research interests include distributed and parallel systems, cloud computing and big data management, blockchain techniques and systems. He has published more than 200 papers in refereed journals and conferences. He is the ACM member, the IEEE senior member, and the CCF fellow.

    View full text