当前位置: X-MOL 学术Comput. Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A method of two-stage clustering learning based on improved DBSCAN and density peak algorithm
Computer Communications ( IF 6 ) Pub Date : 2021-01-06 , DOI: 10.1016/j.comcom.2020.12.019
Mingyang Li , Xinhua Bi , Limin Wang , Xuming Han

Density peak (DP) and density-based spatial clustering of applications with noise (DBSCAN) are the representative clustering algorithms on the basis of density in unsupervised learning. They are capable of clustering data of arbitrary shape as well as identifying noise samples in a potential data set. Notwithstanding, DP algorithm depends on the decision graph when selecting the centers, it is difficult for users without priori knowledge to automatically as well as accurately identify cluster centers. The clustering performance exhibited by DBSCAN algorithm presents a strong sensitivity to parameter setting regarding Eps and MinPts. For dealing with afore-mentioned issues, we propose a new two-stage clustering method based on improved DBSCAN and DP algorithm (TSCM), which first use an improved DBSCAN algorithm based on bat optimization to generate initial clusters. Specifically, the improved DBSCAN takes a well-known internal clustering validation index without labels called Silhouette as fitness function to control the process of parameters determination by bat optimization. The cluster centers in decision graph are automatically selected according to the initial clusters. The final clusters are obtained by DP with the determined cluster centers. As found in the experiments, relative to DP and DBSCAN, TSCM can effectively overcome the manual intervention of cluster center selection in DP and parameters setting in DBSCAN. The clustering performance is significantly improved.



中文翻译:

基于改进的DBSCAN和密度峰值算法的两阶段聚类学习方法

密度峰值(DP)和基于噪声的应用程序基于密度的空间聚类(DBSCAN)是基于无监督学习中的密度的代表性聚类算法。它们能够对任意形状的数据进行聚类,并能够识别潜在数据集中的噪声样本。尽管如此,DP算法在选择中心时仍取决于决策图,没有先验知识的用户很难自动以及准确地识别集群中心。DBSCAN算法表现出的聚类性能对与EpsMinPts有关的参数设置具有很强的敏感性。为了解决上述问题,我们提出了一种基于改进的DBSCAN和DP算法(TSCM)的新的两阶段聚类方法,该方法首先使用基于bat优化的改进的DBSCAN算法来生成初始聚类。具体而言,经过改进的DBSCAN将众所周知的内部聚类验证索引(没有称为Silhouette的标签)作为适应度函数来控制通过蝙蝠优化确定参数的过程。根据初始聚类自动选择决策图中的聚类中心。最终聚类由DP与确定的聚类中心获得。从实验中可以看出,相对于DP和DBSCAN,TSCM可以有效克服DP中集群中心选择和DBSCAN中参数设置的手动干预。群集性能得到显着改善。

更新日期:2021-01-06
down
wechat
bug