当前位置: X-MOL 学术Iran. J. Sci. Technol. Trans. Electr. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
FEM-DBSCAN: An Efficient Density-Based Clustering Approach
Iranian Journal of Science and Technology, Transactions of Electrical Engineering ( IF 2.4 ) Pub Date : 2021-01-05 , DOI: 10.1007/s40998-020-00396-4
Uranus Kazemi , Reza Boostani

Due to the uncontrollable growth of data generation in various networks, rapid clustering of massive datasets is seriously demanded in order to reveal the hidden structure of data as well as discovering the relations among samples. Among the clustering approaches, density-based clustering methods showed an acceptable processing speed to encounter with big data. However, they have some fixed parameters, which are not certainly optimized for all parts of the feature space. Moreover, the complexity of these clustering methods is highly dependent on the number of samples. In this paper, we have deployed Fisher expectation maximization (FEM) to adaptively divide the feature space into some subspaces, where no cluster is shared between the adjacent subspaces. Afterward, we applied density-based spatial clustering of applications with noise (DBSCAN) to each partition yielding to decrease the computational complexity on each thread as well as better learning of its parameters on each subspace. The performance of the proposed method was assessed over three big-size and ten middle-size datasets. The achieved results implied the superiority of the proposed method to OPTICS, Den Clue and DBSCAN methods in terms of clustering accuracy (purity) and processing time.

中文翻译:

FEM-DBSCAN:一种高效的基于密度的聚类方法

由于各种网络中数据生成的不可控增长,迫切需要对海量数据集进行快速聚类,以揭示数据的隐藏结构以及发现样本之间的关系。在聚类方法中,基于密度的聚类方法在处理大数据时表现出可接受的处理速度。然而,它们有一些固定的参数,这些参数肯定没有针对特征空间的所有部分进行优化。此外,这些聚类方法的复杂性高度依赖于样本数量。在本文中,我们部署了 Fisher 期望最大化 (FEM) 来自适应地将特征空间划分为一些子空间,其中相邻子空间之间不共享集群。之后,我们将基于密度的带有噪声的应用程序空间聚类 (DBSCAN) 应用于每个分区,以降低每个线程的计算复杂度以及更好地学习每个子空间上的参数。在三个大型和十个中型数据集上评估了所提出方法的性能。取得的结果表明所提出的方法在聚类精度(纯度)和处理时间方面优于 OPTICS、Den Clue 和 DBSCAN 方法。
更新日期:2021-01-05
down
wechat
bug