Improving Density Peaks Clustering through GPU acceleration
Introduction
Data clustering is one of the most fundamental problems in many real-world applications, such as recommender systems, social networks, image processing, and bioinformatics. Basically, it groups a set of objects based on the similarities of objects such that objects in the same group (i.e., cluster) are more similar to each other than to those in other groups. Many different clustering algorithms have been proposed in the literature such as Kmeans [1] and DBSCAN [2]. There are also many research efforts to improve the efficiency of clustering to handle massive data [3], [4], [5].
Density Peaks Clustering (DPC) [6] is a novel clustering algorithm proposed recently. Given a set of points, DPC computes two metrics for every point : (i) the local density , which is the number of points within a specified distance from ; and (ii) the dependent distance , which is the minimum distance from to other points with higher densities. It is observed that the center of a cluster sees the highest local density among its neighboring points (i.e., density center), and has a relatively large distance from other points with higher densities (i.e., far away from other density centers). Thus, cluster centers can be identified as points with both large and large . With these identified cluster centers and the density center dependency trees extracted during the dependent distance ’s computation, the point-to-center relationship, or in other words, the point-to-cluster assignment (clustering results), can be discovered.
Compared with previous clustering algorithms, DPC has many advantages. (1) Unlike Kmeans, DPC does not require a pre-specified number of clusters. (2) DPC does not assume the clusters to be “balls” in space and supports arbitrarily shaped clusters. (3) DPC is more deterministic, since the clustering results have been shown to be robust against the initial choice of algorithm parameters. (4) The extraction of (, ) provides a two-dimensional representation of the input data, which can be in very high dimensions, so that it is easier for users to gain new insights from the two-dimensional representation plot of the data. Due to its effectiveness and novelty, DPC has already been employed in a wide range of applications, such as neuroscience [7], geoscience [8], and computer vision [9].
While DPC is attractive for its effectiveness and its simplicity, the application of DPC is limited by its computational cost. In order to obtain the density values , DPC computes the distance between every pair of points. That is, given points in the input data set, its computational cost is . Moreover, in order to obtain the dependent distance values , a global sort operation on all points based on their density values (with computational cost ) and compare operations are required. As a result, it can be very time consuming to perform DPC on large data sets.
In the past few years, several research efforts have been put on accelerating DPC. LSH-DDP [3], EDDPC [10] and FDDP [11] leverage distributed approaches to help DPC handle large scale datasets. EDMStream [12] improves DPC by efficiently maintaining a novel in-memory dependent-tree structure. Ex-DPC and S-Approx-DPC [13] accelerate DPC efficiency by leveraging multi-core processing.
The recent advance in GPU technology is offering great prospects in parallel computation [14], [15]. With up to 80 GB GPU memory size [16], it is possible to use GPU to process large-scale data. There exist several related work have been devoted to accelerate DPC using GPU’s parallelization ability. Li et al. [17] propose a thread/block model and shared memory designs to accelerate the distance matrix computation. CUDA-DP [18] also integrates GPU’s parallelization ability and improves data locality to increase performance. However, these methods only focus on employing GPU’s many-core features to accelerate DPC, without paying attention to utilizing spatial index structures that can greatly filter out a large number of unnecessary all-pair computations.
In this paper, we exploit a spatial index structure vantage point tree (VP-Tree) [19] to help efficiently maintain clustering data. With VP-Tree, data points are partitioned into “hypershells” with decreasing radius. Comparing with other spatial index structures (such as KD-Tree [20] and Ball-Tree [21]), VP-Tree is more appropriate in DPC algorithm, because the decreasing-radius hypershell structure in VP-Tree can well support the point density computation (that obtains nearby points within a predefined radius of a point) and the dependent distance computation (that obtains the distance to the nearest neighbor with higher density). Besides, VP-Tree is more suitable for clustering high-dimensional data [22]. More importantly, the construction of VP-Tree and the search of VP-Tree can be well parallelized to adapt to GPU’s many-core architecture. Based on the GPU-based VP-Tree, we propose GDPC algorithm, where the density and the dependent distance can be efficiently calculated by querying the index structure and lots of unnecessary distance measurements (between faraway points) can be avoided.
On the other hand, new data are produced every day and the data distribution may evolve overtime. As a result, the clustering results should continuously evolve correspondingly. The dynamically evolving feature of data drives us to seek an incremental clustering approach. Rather than re-performing DPC on the whole updated data sets, the incremental clustering algorithm leverages previous clustering results and only updates the affected point-to-cluster assignments. Considering the massive size of data sets, the incremental processing approach is desirable especially in production usage. Therefore, we further propose Incremental GDPC to support incremental clustering, which extends GDPC in the following aspects. (1) We design a GPU-friendly dynamic VP-Tree index update scheme that reduces the number of tree traversals and eliminates the write-write conflicts in GPU’s many core computations. (2) Based on this dynamic VP-Tree index, we propose Incremental GDPC which can efficiently update the clustering results.
To sum up, we list our contributions in the following.
- •
GPU-Accelerated VP-Tree Construction. We design a vectorized VP-Tree layout to adapt to GPU architecture and take full advantage of GPU parallelism to speed up the VP-Tree index construction.
- •
GPU-Accelerated DPC Implementation. We propose to use the VP-Tree index to improve the efficiency of all-pair computation and rely on this index to avoid unnecessary computations in DPC’s density evaluation and dependent distance evaluation with GPU’s parallel computation support.
- •
GPU-Accelerated Incremental DPC Update. We provide incremental clustering support for dynamically evolving data by designing the GPU-friendly incremental update methods of density and dependent distance based on the dynamic VP-Tree.
We perform experiments on various real-world datasets and compare with a state-of-the-art GPU-based DPC algorithm CUDA-DP [18], a multicore-based parallel DPC algorithm S-Approx-DPC [13], and a distributed DPC implementation LSH-DDP [3]. Our results show that our GDPC can achieve 5.3–17.8X speedup over CUDA-DP, 43–148.9X speedup over S-Approx-DPC and 44.8–78.8X speedup over LSH-DDP. We further perform experiments on evolving datasets and compare with the state-of-the-art incremental DPC algorithm EDMStream [12]. Our results show that our Incremental GDPC can achieve 2.3–40.5X speedup over EDMStream.
The remainder of the paper is organized as follows. Section 2 describes the background on DPC and GPU’s architecture. Section 3 presents the GPU-accelerated VP-Tree construction and query methods. Section 4 proposes our GPU-accelerated DPC algorithm GDPC. Section 5 introduces how we maintain the dynamic VP-Tree on GPU and proposes Incremental GDPC. Section 6 reports the experimental results. Section 7 discusses related work and Section 8 concludes the paper.
Section snippets
Background and preliminaries
In this section, we first review the standard Density Peaks Clustering (DPC) algorithm. We then introduce the background of GPU architecture and memory hierarchy.
GPU-accelerated VP-Tree construction and query
VP-Tree is the key component in our proposed GPU- accelerated DPC algorithm. In this section, we first discuss why we prefer VP-Tree over other spatial index structures. We then describe the VP-Tree construction and query methods in detail.
GDPC based on VP-Tree
In this section, we describe our proposed GPU-based DPC algorithm, GDPC that utilizes the constructed VP-Tree to accelerate DPC. The original DPC algorithm contains three steps, computing density values , computing dependent distances , and assigning points to clusters. All three steps of GDPC are performed on GPU. In the following, we will describe these steps respectively.
Incremental GDPC
In this section, we present how to handle incremental updates of GDPC clustering results on evolving datasets. Suppose the original dataset is and the newly added data is . In order to update clustering results in response to input changes, we should first update the VP-Tree efficiently with GPU parallelism. In Section 5.1, we propose an incremental update method that can adjust the GPU-based VP-Tree incrementally with newly added data . Furthermore, the addition of new points always
Experiments
This section evaluates GDPC and its incremental variant on real-world datasets to verify their benefits.
Machine Configuration. We conduct all experiments on two 8-core servers (Intel Xeon CPU Silver 4110 @ 2.1 GHz, 32 logical CPUs and 64GB host memory) with an NVIDIA RTX2080Ti GPU. It has 68 SMs and 11 GB GDDR6 memory with a peak bandwidth of 616 GB/s. Our implementation is compiled by CUDA 10 along with nvcc optimization flag -O3.
Dataset. Table 1 lists the data sets used in our experiments.
Related work
Parallel clustering makes full use of the resources of multiple processors, makes the clustering algorithm run on multiple processors at the same time, greatly shortens the execution time of the clustering algorithm, and provides an effective solution for large-scale data clustering analysis.
In recent years, general-purpose graphics processing units (GPGPU) have been widely used to facilitate processing-intensive operations for their parallel processing ability. There exist a number of studies
Conclusion
In this paper, we propose a parallel density peaks algorithm named GDPC, which can fully utilize the powerful computation resources of GPU. It leverages a GPU-friendly spatial index VP-Tree to reduce unnecessary distance calculations. The VP-Tree construction process and the DP clustering process are greatly improved by utilizing GPU’s parallel optimizations. Our results show that GDPC can achieve 5.3–148.9X speedup over the state-of-the-art DPC implementations and our Incremental GDPC can
CRediT authorship contribution statement
Zhuojin Liu: Conceptualization, Methodology, Software, Writing – original draft, Writing – review & editing. Shufeng Gong: Conceptualization, Methodology, Writing – original draft, Writing – review & editing. Yuxuan Su: Methodology, Software, Writing – original draft. Changyi Wan: Methodology, Software, Writing – original draft. Yanfeng Zhang: Conceptualization, Methodology, Supervision, Writing – review & editing. Ge Yu: Project administration, Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The work is supported by the National Natural Science Foundation of China (62072082, U2241212, U1811261, 62202088), the Key R&D Program of Liaoning Province (2020JH2/10100037), and the Fundamental Research Funds for the Central Universities (N2216015, N2216012).
Zhuojin Liu is currently working towards a graduate degree in computer science at Northeastern University. Her research interests include parallel and high-performance computing. She works on designing efficient algorithms and data structures for processing large-scale data sets.
References (43)
- et al.
G-DBSCAN: A GPU accelerated algorithm for density-based clustering
Procedia Comput. Sci.
(2013) Least squares quantization in PCM
IEEE Trans. Inform. Theory
(1982)- M. Ester, H.P. Kriegel, J. Sander, X. Xu, et al., A density-based algorithm for discovering clusters in large spatial...
- et al.
Efficient distributed density peaks for clustering large data sets in MapReduce
IEEE Trans. Knowl. Data Eng.
(2016) - Y. Wang, Y. Gu, J. Shun, Theoretically-efficient and practical parallel DBSCAN, in: 2020 ACM SIGMOD International...
- J. Gan, Y. Tao, DBSCAN revisited: Mis-claim, un-fixability, and approximation, in: 2015 ACM SIGMOD International...
- et al.
Clustering by fast search and find of density peaks
Science
(2014) - et al.
Demixed principal component analysis of population activity in higher cortical areas reveals independent representation of task parameters
(2014) - et al.
Exemplar component analysis: A fast band selection method for hyperspectral imagery
Geosci. Remote Sens. Lett.
(2015) - et al.
High-speed multiparameter photophysical analyses of fluorophore libraries
Anal. Chem.
(2015)
EDDPC: An efficient distributed density peaks clustering algorithm
J. Comput. Res. Dev.
Distributed density peaks clustering revisited
IEEE Trans. Knowl. Data Eng.
Clustering stream data by exploring the evolution of density mountain
VLDB Endow.
HyTGraph: GPU-Accelerated Graph Processing with Hybrid Transfer Management
CoRR
Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit
Front. Inform. Technol. Electron. Eng.
Multidimensional binary search trees used for associative searching
Commun. ACM
Cited by (4)
Density peak clustering algorithms: A review on the decade 2014–2023
2024, Expert Systems with ApplicationsThree-way evidence theory-based density peak clustering with the principle of justifiable granularity
2024, Applied Soft ComputingDensity peak clustering based on improved dung beetle optimization and mahalanobis metric
2023, Journal of Intelligent and Fuzzy Systems
Zhuojin Liu is currently working towards a graduate degree in computer science at Northeastern University. Her research interests include parallel and high-performance computing. She works on designing efficient algorithms and data structures for processing large-scale data sets.
Shufeng Gong received the Ph.D. degree in computer science from Northeastern University, China, in 2021. He is currently a lecturer with Northeastern University, Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, China. His research interests include cloud computing, distributed graph processing, and data mining.
Yuxuan Su received the master degree in computer technology from Northeastern University, China, in 2021. Her research interests include database indexing and high-performance computing.
Changyi Wan received the master degree in computer technology from Northeastern University, China, in 2021. His main area of research includes high-performance computing and machine learning.
Yanfeng Zhang received the Ph.D. degree in computer science from Northeastern University, China, in 2012. He is currently a professor with Northeastern University, Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, China. His research consists of distributed systems and big data processing. He has published many papers in the above areas. His paper in SoCC 2011 was honored with ‘Paper of Distinction’.
Ge Yu (Senior Member, IEEE) received the Ph.D. degree in computer science from Kyushu University, Japan, in 1996. He is currently a professor with Northeastern University, China. His current research interests include distributed and parallel systems, cloud computing and big data management, blockchain techniques and systems. He has published more than 200 papers in refereed journals and conferences. He is the ACM member, the IEEE senior member, and the CCF fellow.