当前位置: X-MOL 学术J. Intell. Fuzzy Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GNN-DBSCAN: A new density-based algorithm using grid and the nearest neighbor
Journal of Intelligent & Fuzzy Systems ( IF 1.7 ) Pub Date : 2021-08-23 , DOI: 10.3233/jifs-211922
Li Yihong 1 , Wang Yunpeng 1 , Li Tao 1 , Lan Xiaolong 1 , Song Han 1
Affiliation  

DBSCAN (density-based spatial clustering of applications with noise) is one of the most widely used density-based clustering algorithms, which can find arbitrary shapes of clusters, determine the number of clusters, and identify noise samples automatically. However, the performance of DBSCAN is significantly limited as it is quite sensitive to the parameters of eps and MinPts. Eps represents the eps-neighborhood and MinPts stands for a minimum number of points. Additionally, a dataset with large variations in densities will probably trap the DBSCAN because its parameters are fixed. In order to overcome these limitations, we propose a new density-clustering algorithm called GNN-DBSCAN which uses an adaptive Grid to divide the dataset and defines local core samples by using the Nearest Neighbor. With the help of grid, the dataset space will be divided into a finite number of cells. After that, the nearest neighbor lying in every filled cell and adjacent filled cells are defined as the local core samples. Then, GNN-DBSCAN obtains global core samples by enhancing and screening local core samples. In this way, our algorithm can identify higher-quality core samples than DBSCAN. Lastly, give these global core samples and use dynamic radius based on k-nearest neighbors to cluster the datasets. Dynamic radius can overcome the problems of DBSCAN caused by its fixed parameter eps. Therefore, our method can perform better on dataset with large variations in densities. Experiments on synthetic and real-world datasets were conducted. The results indicate that the average Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), Adjusted Mutual Information (AMI) and V-measure of our proposed algorithm outperform the existing algorithm DBSCAN, DPC, ADBSCAN, and HDBSCAN.

中文翻译:

GNN-DBSCAN:一种使用网格和最近邻的新的基于密度的算法

DBSCAN(带有噪声的应用程序的基于密度的空间聚类)是应用最广泛的基于密度的聚类算法之一,它可以找到任意形状的聚类,确定聚类的数量,并自动识别噪声样本。然而,DBSCAN 的性能受到很大限制,因为它对 eps 和 MinPts 的参数非常敏感。Eps 代表 eps 邻域,而 MinPts 代表最小点数。此外,密度变化很大的数据集可能会捕获 DBSCAN,因为它的参数是固定的。为了克服这些限制,我们提出了一种称为 GNN-DBSCAN 的新密度聚类算法,该算法使用自适应网格划分数据集并使用最近邻定义局部核心样本。在网格的帮助下,数据集空间将被划分为有限数量的单元格。之后,位于每个填充单元中的最近邻居和相邻的填充单元被定义为局部核心样本。然后,GNN-DBSCAN 通过增强和筛选局部核心样本,获得全局核心样本。通过这种方式,我们的算法可以识别出比 DBSCAN 更高质量的核心样本。最后,给出这些全局核心样本并使用基于 k 最近邻的动态半径对数据集进行聚类。动态半径可以克服 DBSCAN 固定参数 eps 带来的问题。因此,我们的方法可以在密度变化很大的数据集上表现得更好。对合成数据集和真实数据集进行了实验。结果表明,平均调整兰德指数 (ARI)、归一化互信息 (NMI)、
更新日期:2021-08-24
down
wechat
bug