当前位置: X-MOL 学术Pattern Anal. Applic. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data
Pattern Analysis and Applications ( IF 3.9 ) Pub Date : 2020-04-22 , DOI: 10.1007/s10044-020-00884-7
Kavan Fatehi , Mohsen Rezvani , Mansoor Fateh

The curse of dimensionality in high-dimensional data is one of the major challenges in data clustering. Recently, a considerable amount of literature has been published on subspace clustering to address this challenge. The main objective of the subspace clustering is to discover clusters embedded in any possible combination of the attributes. Previous studies have mostly been generating redundant subspace clusters, leading to clustering accuracy loss and also increasing the running time. In this paper, a bottom-up density-based approach is proposed for clustering of high-dimensional data. We employ the cluster structure as a similarity measure to generate the optimal subspaces which result in raising the accuracy of the subspace clustering. Using this idea, we propose an iterative algorithm to discover similar subspaces using the similarity in the features of subspaces. At each iteration of this algorithm, it first determines similar subspaces, then combines them to generate higher-dimensional subspaces, and finally re-clusters the subspaces. The algorithm repeats these steps and converges to the final clusters. Experiments on various synthetic and real datasets show that the results of the proposed approach are significantly better in both quality and runtime comparing to the state of the art on clustering high-dimensional data. The accuracy of the proposed method is around 34% higher than the CLIQUE algorithm and around 6% higher than DiSH.

中文翻译:

ASCRClu:用于高维数据聚类的自适应子空间组合和约简算法

高维数据的维诅咒是数据聚类的主要挑战之一。最近,针对子空间聚类的大量文献已经发表,以应对这一挑战。子空间聚类的主要目的是发现嵌入在属性的任何可能组合中的聚类。先前的研究主要是生成冗余子空间集群,从而导致集群精度损失,并增加了运行时间。本文提出了一种基于自下而上的基于密度的方法来对高维数据进行聚类。我们采用聚类结构作为相似性度量,以生成最佳子空间,从而提高子空间聚类的准确性。利用这个想法,我们提出一种迭代算法,利用子空间特征的相似性来发现相似的子空间。在此算法的每次迭代中,它首先确定相似的子空间,然后将它们组合以生成高维子空间,最后重新对子空间进行聚类。该算法重复这些步骤并收敛到最终聚类。在各种综合和真实数据集上进行的实验表明,与对高维数据进行聚类的最新技术相比,该方法的结果在质量和运行时间上都明显更好。所提方法的准确性比CLIQUE算法高约34%,比DiSH高约6%。并最终重新聚集子空间。该算法重复这些步骤并收敛到最终聚类。在各种综合和真实数据集上进行的实验表明,与对高维数据进行聚类的最新技术相比,该方法的结果在质量和运行时间上都明显更好。所提出方法的准确性比CLIQUE算法高约34%,比DiSH高约6%。并最终重新聚集子空间。该算法重复这些步骤并收敛到最终聚类。在各种综合数据集和真实数据集上进行的实验表明,与聚类高维数据的最新技术相比,该方法的结果在质量和运行时间上都明显更好。所提出方法的准确性比CLIQUE算法高约34%,比DiSH高约6%。
更新日期:2020-04-22
down
wechat
bug