Deep k-Means: Jointly clustering with k-Means and learning representations☆
Introduction
Clustering is a long-standing problem in the machine learning and data mining fields, and thus accordingly fostered abundant research. Traditional clustering methods, e.g., k-Means [22] and Gaussian Mixture Models (GMMs) [5], fully rely on the original data representations and may then be ineffective when the data points (e.g., images and text documents) live in a high-dimensional space – a problem commonly known as the curse of dimensionality. Significant progress has been made in the last decade or so to learn better, low-dimensional data representations [12]. The most successful techniques to achieve such high-quality representations rely on deep neural networks (DNNs), which apply successive non-linear transformations to the data in order to obtain increasingly high-level features. Auto-encoders (AEs) are a special instance of DNNs which are trained to embed the data into a (usually dense and low-dimensional) vector at the bottleneck of the network, and then attempt to reconstruct the input based on this vector. The appeal of AEs lies in the fact that they are able to learn representations in a fully unsupervised way. The representation learning breakthrough enabled by DNNs spurred the recent development of numerous deep clustering approaches which aim at jointly learning the data points’ representations as well as their cluster assignments.
In this study, we specifically focus on the k-Means-related deep clustering problem. Contrary to previous approaches that alternate between continuous gradient updates and discrete cluster assignment steps [29], we show here that one can solely rely on gradient updates to learn, truly jointly, representations and clustering parameters. This ultimately leads to a better deep k-Means method which is also more scalable as it can fully benefit from the efficiency of stochastic gradient descent (SGD). In addition, we perform a careful comparison of different methods by (a) relying on the same auto-encoders, as the choice of auto-encoders impacts the results obtained, (b) tuning the hyperparameters of each method on a small validation set, instead of setting them without clear criteria, and (c) enforcing, whenever possible, that the same initialization and sequence of SGD minibatches are used by the different methods. The last point is crucial to compare different methods as these two factors play an important role and the variance of each method is usually not negligible.
Section snippets
Related work
In the wake of the groundbreaking results obtained by DNNs in computer vision, several deep clustering algorithms were specifically designed for image clustering [7], [9], [13], [14], [30]. These works have in common the exploitation of Convolutional Neural Networks (CNNs), which extensively contributed to last decade’s significant advances in computer vision. Inspired by agglomerative clustering, Yang et al. [30] proposed a recurrent process which successively merges clusters and learn image
Deep k-Means
In the remainder, x denotes an object from a set of objects to be clustered. represents the space in which learned data representations are to be embedded. K is the number of clusters to be obtained, the representative of cluster k, 1 ≤ k ≤ K, and the set of representatives. Functions f and g define some distance in which are assumed to be fully differentiable wrt their variables. For any vector gives the closest representative of y according to f.
The
Experiments
In order to evaluate the clustering results of our approach, we conducted experiments on different datasets and compared it against state-of-the-art standard and k-Means-related deep clustering models.
Conclusion
We have presented in this paper a new approach for jointly clustering with k-Means and learning representations by considering the k-Means clustering loss as the limit of a differentiable function. If several studies have proposed solutions to this problem with different clustering losses, to the best of our knowledge, this is the first approach that truly jointly optimizes, through simple stochastic gradient descent updates, representation and k-Means clustering losses. In addition to
Declaration of Competing Interest
Authors declare that they have no conflict of interest.
References (30)
- et al.
A deterministic annealing approach to clustering
Pattern Recognit. Lett.
(1990) - et al.
Soft-to-hard vector quantization for end-to-end learning compressible representations
Proceedings of the 31st Annual Conference on Neural Information Processing Systems, NIPS ’17
(2017) - E. Aljalbout, V. Golkov, Y. Siddiqui, D. Cremers, Clustering with Deep Learning: Taxonomy and New Methods,...
- et al.
K-Means++: the advantages of careful seeding
Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07
(2007) - et al.
Greedy layer-wise training of deep networks
Proceedings of the 20th Annual Conference on Neural Information Processing Systems, NIPS ’06
(2006) Pattern Recognition and Machine Learning
(2006)- et al.
Locally consistent concept factorization for document clustering
IEEE Trans. Knowl. Data Eng.
(2011) - et al.
Deep adaptive image clustering
Proceedings of the 2017 IEEE International Conference on Computer Vision, ICCV ’17
(2017) - N. Dilokthanakul, P.A.M. Mediano, M. Garnelo, M.C.H. Lee, H. Salimbeni, K. Arulkumaran, M. Shanahan, Deep Unsupervised...
- et al.
Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization
Proceedings of the 2017 IEEE International Conference on Computer Vision, ICCV ’17
(2017)
Understanding the difficulty of training deep feedforward neural networks
Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, AISTATS ’10
Improved deep embedded clustering with local structure preservation
Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI ’17
Reducing the dimensionality of data with neural networks
Science
CNN-based joint clustering and representation learning with feature drift compensation for large-scale image data
IEEE Trans. Multimed.
Learning discrete representations via information maximizing self-augmented training
Proceedings of the 34th International Conference on Machine Learning, ICML ’17
Cited by (156)
Contrastive deep convolutional transform k-means clustering
2024, Information SciencesWeighted Bag of Visual Words with enhanced deep features for melanoma detection
2024, Expert Systems with ApplicationsBeyond k-Means++: Towards better cluster exploration with geometrical information
2024, Pattern RecognitionMultiple spheres detection problem—Center based clustering approach
2023, Pattern Recognition LettersDo textual risk disclosures reveal corporate risk? Evidence from U.S. fintech corporations
2023, Economic Modelling
- ☆
Handle by Associate Editor: Andrea Torsello.