Abstract
Clustering is an important part of data mining. The existing clustering algorithm failed in the data set with uneven density distribution. In this paper, we propose a novel clustering algorithm relative density-based clustering algorithm for identifying diverse density clusters effectively called IDDC. It can effectively identify clusters in data sets with different densities and can also handle outliers. We first compute relative density for each data point. Then, the density peak points are screened and the initial clusters are obtained according to these peak points. The strategy for assigning the remaining points is to find unallocated points from the perspective of the cluster, which can effectively identify different density. In experiments, we compare the proposed algorithm IDDC with some existing algorithms on synthetic and real-world data sets. The results show that IDDC performs better than those existing algorithms, especially clustering on data set with uneven density distribution.
Similar content being viewed by others
References
Albatineh AN, Niewiadomska-Bugaj M, Mihalko D (2006) On similarity indices and correction for chance agreement. J Classif 23(2):301–313
Bartel HG, Mucha HJ, Dolata J (2003) On a modification of a graph theory based partitioning method in cluster analysis. Match Commun Math Comput Chem 48(48):1070–1070
Baulieu FB (1989) A classification of presence/absence based dissimilarity coefficients. J Classif 6(1):233–246
Breunig MM, Kriegel HP, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, SIGMOD ’00, pp 93–104. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/342009.335388
Friedman J, Hastie T, Tibshirani R (2009) The elements of statistical learning. Springer, New York
Cai D, He X, Han J, Huang TS (2011) Graph regularized non-negative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560
Cheng D, Zhang S, Huang J (2020) Dense members of local cores-based density peaks clustering algorithm. Knowl Based Syst 193:105454
Deng C, He X, Han J (2011) Speed up kernel discriminant analysis. Vldb J 20(1):21–33
Du M, Ding S, Jia H (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl Based Syst 99(may1):135–145
Dua D, Graff C (2017) UCI machine learning repository . http://archive.ics.uci.edu/ml
Ester M (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the international conference knowledge discovery and data mining
Fränti P, Sieranoja S (2018) K-means properties on six clustering benchmark datasets. http://cs.uef.fi/sipu/datasets/
Fu L, Medico E (2007) Flame, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinform 8:1–15
Gao Y, Chen G, Li Q, Zheng B, Li C (2008) Processing mutual nearest neighbor queries for moving object trajectories. In: International conference on mobile data management
Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):4-es
Han J, Kamber M, Jian P (2011) Data mining: concepts and techniques: concepts and techniques. Data Min Concepts Models Methods Algorithms Second Ed 5(4):1–18
He L, Wu L, Cai Y (2007) Survey of clustering algorithms in data mining. Appl Res Comput 24(1):10–13
Hong C, Yeung DY (2008) Robust path-based spectral clustering. Pattern Recogn 41(1):191–203
Huang X, Ye Y, Zhang H (2014) Extensions of kmeans-type algorithms: a new clustering framework by integrating intracluster compactness and intercluster separation. IEEE Trans Neural Netw Learn Syst 25(8):1433–1446. https://doi.org/10.1109/TNNLS.2013.2293795
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666
Jain AK, Law MHC (2005) Data clustering: a user’s dilemma. Lect Notes Comput Sci 3776:1–10
Liu QB, Deng S, Lu CH, Wang B, Zhou YF (2003) Relative density based k-nearest neighbors clustering algorithm. In: International conference on machine learning and cybernetics
Liu R, Wang H, Yu X (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226
Mitsch S, Müller A, Retschitzegger W, Salfinger A, Schwinger W (2013) A survey on clustering techniques for situation awareness. In: Asia-pacific web conference
Donald Michie D, Spiegelhalter J, Taylor CC, Campbell J (eds) (1994) Machine learning, neural and statistical classification. Ellis Horwood, USA. https://www.freetechbooks.com/machine-learning-neural-and-statistical-classification-t500.html
Olafsson S, Li X, Wu S (2008) Operations research and data mining. Eur J Oper Res 187(3):1429–1448
Peterson LE (2009) K-nearest neighbor. Scholarpedia 4(2):1883
Rate C, Retrieval C (2011) Columbia Object Image Library (COIL-20). In: Nene SA, Nayar SK, Murase H (eds) Technical Report CUCS-005-96, February 1996
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Rui Xu, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Veenman CJ, Reinders MJT, Backer E (2002) A maximum variance cluster algorithm. IEEE Trans Pattern Anal Mach Intell 24(9):1273–1280
Wah WB (2007) Wiley encyclopedia of computer science and engineering. Pattern Recogn
Xiao L, Zhou L, Zhang X, Hui XU, Yang Z (2016) Study of reactive power control partitioning method with spectral cluster analysis based on PCA. Shaanxi Electr Power 44(12):23–28
Xie J, Xiong ZY, Zhang YF, Feng Y, Ma J (2018) Density core-based clustering algorithm with dynamic scanning radius. Knowl Based Syst 142:58–70
Xie J, Gao H, Xie W et al (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors. Inf Sci 354:19–40
Chen Y, Tang S, Zhou L, Wang C, Du J, Wang T, Pei S (2018) Decentralized clustering by finding loose and distributed density cores. Inf Sci 433:510–26
Zahn CT, Zahn CT (1971) Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput 20(1):68–86
Zhou Z, Si G, Zhang Y, Zheng K (2018) Robust clustering by identifying the veins of clusters based on kernel density estimation. Knowl Based Syst 159:309–320
Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recogn Lett, p S016786551630085X
Acknowledgements
This work was supported by National Natural Science Foundation of China grant 61573266.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, Y., Yang, Y. Relative density-based clustering algorithm for identifying diverse density clusters effectively. Neural Comput & Applic 33, 10141–10157 (2021). https://doi.org/10.1007/s00521-021-05777-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-05777-2