当前位置: X-MOL 学术Int. J. Mach. Learn. & Cyber. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GDPC: generalized density peaks clustering algorithm based on order similarity
International Journal of Machine Learning and Cybernetics ( IF 3.1 ) Pub Date : 2020-09-20 , DOI: 10.1007/s13042-020-01198-0
Xiaofei Yang , Zhiling Cai , Ruijia Li , William Zhu

Clustering is a fundamental approach to discover the valuable information in data mining and machine learning. Density peaks clustering is a typical density based clustering and has received increasing attention in recent years. However DPC and most of its improvements still suffer from some drawbacks. For example, it is difficult to find peaks in the sparse cluster regions; assignment for the remaining points tends to cause Domino effect, especially for complicated data. To address the above two problems, we propose generalized density peaks clustering algorithm (GDPC) based on a new order similarity, which is calculated by the order rank of Euclidean distance between two samples. The order similarity can help us to find peaks in the sparse regions. In addition, a two-step assignment is used to weaken Domino effect. In general, GDPC can not only discover clusters in datasets regardless of different sizes, dimensions and shapes, but also address the above two issues. Several experiments on datasets, including Lung, COIL20, ORL, USPS, Mnist, breast and Vote, show that our algorithm is effective in most cases.



中文翻译:

GDPC:基于阶相似度的广义密度峰聚类算法

集群是发现数据挖掘和机器学习中有价值的信息的基本方法。密度峰聚类是典型的基于密度的聚类,近年来受到越来越多的关注。但是,DPC及其大多数改进仍存在一些缺陷。例如,很难在稀疏簇区域中找到峰。剩余点的分配往往会导致Domino效应,尤其是对于复杂数据。为了解决上述两个问题,我们提出了一种基于新的阶相似度的广义密度峰聚类算法(GDPC),该算法通过两个样本之间的欧式距离的阶数来计算。顺序相似性可以帮助我们在稀疏区域中找到峰。另外,使用两步分配来减弱Domino效果。一般来说,GDPC不仅可以发现数据集中的聚类,而不必考虑大小,尺寸和形状的不同,而且可以解决上述两个问题。在数据集上进行的一些实验(包括肺,COIL20,ORL,USPS,Mnist,乳房和投票)表明,我们的算法在大多数情况下是有效的。

更新日期:2020-09-20
down
wechat
bug