Elsevier

Pattern Recognition Letters

Volume 138, October 2020, Pages 185-192
Pattern Recognition Letters

Deep k-Means: Jointly clustering with k-Means and learning representations

https://doi.org/10.1016/j.patrec.2020.07.028Get rights and content

Highlights

  • Differentiable reformulation of the k-Means problem in a learned embedding space.

  • Proposition of an alternative to pretraining based on deterministic annealing.

  • Straightforward training algorithm based on stochastic gradient descent.

  • Careful comparison against k-Means-related and deep clustering approaches.

Abstract

We study in this paper the problem of jointly clustering and learning representations. As several previous studies have shown, learning representations that are both faithful to the data to be clustered and adapted to the clustering algorithm can lead to better clustering performance, all the more so that the two tasks are performed jointly. We propose here such an approach for k-Means clustering based on a continuous reparametrization of the objective function that leads to a truly joint solution. The behavior of our approach is illustrated on various datasets showing its efficacy in learning representations for objects while clustering them.

Introduction

Clustering is a long-standing problem in the machine learning and data mining fields, and thus accordingly fostered abundant research. Traditional clustering methods, e.g., k-Means [22] and Gaussian Mixture Models (GMMs) [5], fully rely on the original data representations and may then be ineffective when the data points (e.g., images and text documents) live in a high-dimensional space – a problem commonly known as the curse of dimensionality. Significant progress has been made in the last decade or so to learn better, low-dimensional data representations [12]. The most successful techniques to achieve such high-quality representations rely on deep neural networks (DNNs), which apply successive non-linear transformations to the data in order to obtain increasingly high-level features. Auto-encoders (AEs) are a special instance of DNNs which are trained to embed the data into a (usually dense and low-dimensional) vector at the bottleneck of the network, and then attempt to reconstruct the input based on this vector. The appeal of AEs lies in the fact that they are able to learn representations in a fully unsupervised way. The representation learning breakthrough enabled by DNNs spurred the recent development of numerous deep clustering approaches which aim at jointly learning the data points’ representations as well as their cluster assignments.

In this study, we specifically focus on the k-Means-related deep clustering problem. Contrary to previous approaches that alternate between continuous gradient updates and discrete cluster assignment steps [29], we show here that one can solely rely on gradient updates to learn, truly jointly, representations and clustering parameters. This ultimately leads to a better deep k-Means method which is also more scalable as it can fully benefit from the efficiency of stochastic gradient descent (SGD). In addition, we perform a careful comparison of different methods by (a) relying on the same auto-encoders, as the choice of auto-encoders impacts the results obtained, (b) tuning the hyperparameters of each method on a small validation set, instead of setting them without clear criteria, and (c) enforcing, whenever possible, that the same initialization and sequence of SGD minibatches are used by the different methods. The last point is crucial to compare different methods as these two factors play an important role and the variance of each method is usually not negligible.

Section snippets

Related work

In the wake of the groundbreaking results obtained by DNNs in computer vision, several deep clustering algorithms were specifically designed for image clustering [7], [9], [13], [14], [30]. These works have in common the exploitation of Convolutional Neural Networks (CNNs), which extensively contributed to last decade’s significant advances in computer vision. Inspired by agglomerative clustering, Yang et al. [30] proposed a recurrent process which successively merges clusters and learn image

Deep k-Means

In the remainder, x denotes an object from a set X of objects to be clustered. Rp represents the space in which learned data representations are to be embedded. K is the number of clusters to be obtained, rkRp the representative of cluster k, 1 ≤ k ≤ K, and R={r1,,rK} the set of representatives. Functions f and g define some distance in Rp which are assumed to be fully differentiable wrt their variables. For any vector yRp, cf(y;R) gives the closest representative of y according to f.

The

Experiments

In order to evaluate the clustering results of our approach, we conducted experiments on different datasets and compared it against state-of-the-art standard and k-Means-related deep clustering models.

Conclusion

We have presented in this paper a new approach for jointly clustering with k-Means and learning representations by considering the k-Means clustering loss as the limit of a differentiable function. If several studies have proposed solutions to this problem with different clustering losses, to the best of our knowledge, this is the first approach that truly jointly optimizes, through simple stochastic gradient descent updates, representation and k-Means clustering losses. In addition to

Declaration of Competing Interest

Authors declare that they have no conflict of interest.

References (30)

  • K. Rose et al.

    A deterministic annealing approach to clustering

    Pattern Recognit. Lett.

    (1990)
  • E. Agustsson et al.

    Soft-to-hard vector quantization for end-to-end learning compressible representations

    Proceedings of the 31st Annual Conference on Neural Information Processing Systems, NIPS ’17

    (2017)
  • E. Aljalbout, V. Golkov, Y. Siddiqui, D. Cremers, Clustering with Deep Learning: Taxonomy and New Methods,...
  • D. Arthur et al.

    K-Means++: the advantages of careful seeding

    Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07

    (2007)
  • Y. Bengio et al.

    Greedy layer-wise training of deep networks

    Proceedings of the 20th Annual Conference on Neural Information Processing Systems, NIPS ’06

    (2006)
  • C.M. Bishop

    Pattern Recognition and Machine Learning

    (2006)
  • D. Cai et al.

    Locally consistent concept factorization for document clustering

    IEEE Trans. Knowl. Data Eng.

    (2011)
  • J. Chang et al.

    Deep adaptive image clustering

    Proceedings of the 2017 IEEE International Conference on Computer Vision, ICCV ’17

    (2017)
  • N. Dilokthanakul, P.A.M. Mediano, M. Garnelo, M.C.H. Lee, H. Salimbeni, K. Arulkumaran, M. Shanahan, Deep Unsupervised...
  • K.G. Dizaji et al.

    Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization

    Proceedings of the 2017 IEEE International Conference on Computer Vision, ICCV ’17

    (2017)
  • X. Glorot et al.

    Understanding the difficulty of training deep feedforward neural networks

    Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, AISTATS ’10

    (2010)
  • X. Guo et al.

    Improved deep embedded clustering with local structure preservation

    Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI ’17

    (2017)
  • G.E. Hinton et al.

    Reducing the dimensionality of data with neural networks

    Science

    (2006)
  • C.-C. Hsu et al.

    CNN-based joint clustering and representation learning with feature drift compensation for large-scale image data

    IEEE Trans. Multimed.

    (2018)
  • W. Hu et al.

    Learning discrete representations via information maximizing self-augmented training

    Proceedings of the 34th International Conference on Machine Learning, ICML ’17

    (2017)
  • Cited by (156)

    View all citing articles on Scopus

    Handle by Associate Editor: Andrea Torsello.

    View full text