当前位置: X-MOL 学术Big Data Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Flexible Subspace Clustering: A Joint Feature Selection and K-Means Clustering Framework
Big Data Research ( IF 3.3 ) Pub Date : 2020-11-12 , DOI: 10.1016/j.bdr.2020.100170
Zhong-Zhen Long , Guoxia Xu , Jiao Du , Hu Zhu , Taiyu Yan , Yu-Feng Yu

Regarding as an important computing paradigm, cloud computing is to address big and distributed databases and rather simple computation. In this paradigm, data mining is one of the most important and fundamental problems. A large amount of data is generated by sensors and other intelligent devices. Data mining for these big data is crucial in various applications. K-means clustering is a typical technique to group the similar data into the same clustering, and has been commonly used in data mining. However, it is still a challenge to the data containing a large amount of noise, outliers and redundant features. In this paper, we propose a robust K-means clustering algorithm, namely, flexible subspace clustering. The proposed method incorporates feature selection and K-means clustering into a unified framework, which can select the refined features and improve the clustering performance. Moreover, for the purpose of enhancing the robustness, the l2.p-norm is embedded into the objective function. We can flexibly choose appropriate p according to the different data and thus obtain more robust performance. Experimental results verify the presented method has more robust and better performance on benchmark databases compared to the existing approaches.



中文翻译:

灵活的子空间聚类:联合特征选择和K-Means聚类框架

作为一种重要的计算范例,云计算旨在解决大型分布式数据库而不是简单的计算问题。在这种范例中,数据挖掘是最重要和最基本的问题之一。传感器和其他智能设备会生成大量数据。这些大数据的数据挖掘在各种应用程序中至关重要。K-均值聚类是将相似数据分组到相同聚类中的一种典型技术,并且已广泛用于数据挖掘中。但是,对于包含大量噪声,离群值和冗余特征的数据仍然是一个挑战。在本文中,我们提出了一种鲁棒的K均值聚类算法,即灵活的子空间聚类。所提出的方法将特征选择和K-means聚类整合到一个统一的框架中,可以选择经过改进的功能并改善群集性能。此外,为了增强鲁棒性,2p-norm嵌入到目标函数中。我们可以根据不同的数据灵活选择合适的p,从而获得更鲁棒的性能。实验结果证明,与现有方法相比,该方法在基准数据库上具有更强大的性能和更好的性能。

更新日期:2020-11-16
down
wechat
bug