当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An efficient automated incremental density-based algorithm for clustering and classification
Future Generation Computer Systems ( IF 7.5 ) Pub Date : 2020-08-29 , DOI: 10.1016/j.future.2020.08.031
Elham Azhir , Nima Jafari Navimipour , Mehdi Hosseinzadeh , Arash Sharifi , Aso Darwesh

Data clustering divides the datasets into different groups. Incremental Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a famous density-based clustering technique able to find the clusters of variable sizes and shapes. The quality of incremental DBSCAN results has been influenced by two input parameters: MinPts (Minimum Points) and Eps (Epsilon). Therefore, the parameter setting is one of the major problems of incremental DBSCAN. In the present article, an improved incremental DBSCAN accorded to Non-dominated Sorting Genetic Algorithm II (NSGA-II) has been presented to address the issue. The proposed algorithm adjusts the two parameters (MinPts and Eps) of the incremental DBSCAN via the iteration and the fitness functions to enhance the clustering precision. Moreover, our proposed method introduces suitable fitness functions for both labeled and unlabeled datasets. We have also improved the efficiency of the proposed hybrid algorithm by parallelization of the optimization process. The evaluation of the introduced method has been done through some textual and numerical datasets with different shapes, sizes, and dimensions. According to the experimental results, the proposed algorithm provides better results than Multi-Objective Particle Swarm Optimization (MOPSO) based incremental DBSCAN and a few well-known techniques, particularly regarding the shape and balanced datasets. Also, good speed-up can be reached with a parallel model compared with the serial version of the algorithm.



中文翻译:

一种高效的基于增量的自动密度分类和分类算法

数据聚类将数据集分为不同的组。具有噪声的基于增量密度的应用程序空间聚类(DBSCAN)是一项著名的基于密度的聚类技术,能够找到可变大小和形状的聚类。DBSCAN增量结果的质量受到两个输入参数的影响:MinPts(最小点)和Eps(Epsilon)。因此,参数设置是增量DBSCAN的主要问题之一。在本文中,已经提出了一种改进的增量DBSCAN,以解决非主导排序遗传算法II(NSGA-II)。该算法调整了两个参数(MinPtsEps)。增量DBSCAN)通过迭代和适应度函数来提高聚类精度。此外,我们提出的方法为标记和未标记的数据集引入了合适的适应度函数。通过优化过程的并行化,我们还提高了提出的混合算法的效率。已通过一些具有不同形状,大小和尺寸的文本和数字数据集对引入的方法进行了评估。根据实验结果,与基于多目标粒子群优化(MOPSO)的增量DBSCAN和一些众所周知的技术相比,该算法提供了更好的结果,尤其是在形状和平衡数据集方面。而且,与算法的串行版本相比,并行模型可以实现良好的加速。

更新日期:2020-08-29
down
wechat
bug