Improving spectral clustering with deep embedding, cluster estimation and metric learning,Knowledge and Information Systems

当前位置： X-MOL 学术 › Knowl. Inf. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improving spectral clustering with deep embedding, cluster estimation and metric learning
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2020-11-22 , DOI: 10.1007/s10115-020-01530-8
Liang Duan , Shuai Ma , Charu Aggarwal , Saket Sathe

Spectral clustering is one of the most popular modern clustering algorithms. It is easy to implement, can be solved efficiently, and very often outperforms other traditional clustering algorithms such as k-means. However, spectral clustering could be insufficient when dealing with most datasets having complex statistical properties, and it requires users to specify the number k of clusters and a good distance metric to construct the similarity graph. To address these problems, in this article, we propose an approach to extending spectral clustering with deep embedding, cluster estimation, and metric learning. First, we generate the deep embedding via learning a deep autoencoder, which transforms the raw data into their lower dimensional representations suitable for clustering. Second, we provide an effective method to estimate the number of clusters by learning a softmax autoencoder from the deep embedding. Third, we construct a more powerful similarity graph by learning a distance metric from the embedding using a Siamese network. Finally, we conduct an extensive experimental study on image and text datasets, which verifies the effectiveness and efficiency of our approach.

中文翻译：

通过深度嵌入，聚类估计和度量学习来改善频谱聚类

谱聚类是最流行的现代聚类算法之一。它易于实现，可以有效解决，并且通常优于其他传统的聚类算法，例如k -means。但是，在处理大多数具有复杂统计属性的数据集时，频谱聚类可能不足，并且需要用户指定数字k聚类和良好的距离度量来构建相似度图。为了解决这些问题，在本文中，我们提出了一种通过深度嵌入，聚类估计和度量学习来扩展频谱聚类的方法。首先，我们通过学习深度自动编码器来生成深度嵌入，该深度编码器会将原始数据转换为适合聚类的低维表示形式。其次，我们通过从深度嵌入中学习softmax自动编码器，提供了一种估算簇数的有效方法。第三，我们通过使用暹罗网络从嵌入中学习距离度量来构建更强大的相似度图。最后，我们对图像和文本数据集进行了广泛的实验研究，验证了我们方法的有效性和效率。

更新日期：2020-11-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>