当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Robust deep k-means: An effective and simple method for data clustering
Pattern Recognition ( IF 7.5 ) Pub Date : 2021-04-28 , DOI: 10.1016/j.patcog.2021.107996
Shudong Huang , Zhao Kang , Zenglin Xu , Quanhui Liu

Clustering aims to partition an input dataset into distinct groups according to some distance or similarity measurements. One of the most widely used clustering method nowadays is the k-means algorithm because of its simplicity and efficiency. In the last few decades, k-means and its various extensions have been formulated to solve the practical clustering problems. However, existing clustering methods are often presented in a single-layer formulation (i.e., shallow formulation). As a result, the mapping between the obtained low-level representation and the original input data may contain rather complex hierarchical information. To overcome the drawbacks of low-level features, deep learning techniques are adopted to extract deep representations and improve the clustering performance. In this paper, we propose a robust deep k-means model to learn the hidden representations associate with different implicit lower-level attributes. By using the deep structure to hierarchically perform k-means, the hierarchical semantics of data can be exploited in a layerwise way. Data samples from the same class are forced to be closer layer by layer, which is beneficial for clustering task. The objective function of our model is derived to a more trackable form such that the optimization problem can be tackled more easily and the final robust results can be obtained. Experimental results over 12 benchmark data sets substantiate that the proposed model achieves a breakthrough in clustering performance, compared with both classical and state-of-the-art methods.



中文翻译:

坚固深 ķ-means:一种有效而简单的数据聚类方法

聚类旨在根据一些距离或相似性度量将输入数据集划分为不同的组。当今最广泛使用的聚类方法之一是ķ-均值算法,因为它的简单性和效率。在过去的几十年中,ķ-means及其各种扩展已被制定来解决实际的聚类问题。但是,现有的聚类方法通常以单层配方(即浅层配方)呈现。结果,所获得的低级表示与原始输入数据之间的映射可能包含相当复杂的分层信息。为了克服底层特征的弊端,采用了深度学习技术来提取深度表示并提​​高聚类性能。在本文中,我们提出了一个稳健的ķ-means模型以学习与不同的隐式较低级属性关联的隐藏表示形式。通过使用深层结构分层执行ķ-意味着,可以分层的方式利用数据的分层语义。来自同一类别的数据样本被迫逐层靠近,这对于聚类任​​务是有利的。我们模型的目标函数以更可追踪的形式导出,从而可以更轻松地解决优化问题并获得最终的鲁棒结果。在12个基准数据集上的实验结果表明,与传统方法和最新方法相比,该模型在聚类性能上均取得了突破。

更新日期:2021-05-09
down
wechat
bug