Abstract
High-dimensional data clustering has become and remains a challenging task for modern statistics and machine learning, with a wide range of applications. We consider in this work the powerful discriminative latent mixture model, and we extend it to the Bayesian framework. Modeling data as a mixture of Gaussians in a low-dimensional discriminative subspace, a Gaussian prior distribution is introduced over the latent group means and a family of twelve submodels are derived considering different covariance structures. Model inference is done with a variational EM algorithm, while the discriminative subspace is estimated via a Fisher-step maximizing an unsupervised Fisher criterion. An empirical Bayes procedure is proposed for the estimation of the prior hyper-parameters, and an integrated classification likelihood criterion is derived for selecting both the number of clusters and the submodel. The performances of the resulting Bayesian Fisher-EM algorithm are investigated in two thorough simulated scenarios, regarding both dimensionality as well as noise and assessing its superiority with respect to state-of-the-art Gaussian subspace clustering models. In addition to standard real data benchmarks, an application to single image denoising is proposed, displaying relevant results. This work comes with a reference implementation for the software in the package accompanying the paper and available on CRAN.
Similar content being viewed by others
Notes
The necessary source code for reproducible experiments is available and detailed in the CRAN package. More information is available at https://github.com/nicolasJouvin/FisherEM.
One could use \((p-d) \beta \) as the actual variance of the signal, taking into account the fact that there are \((p-d)\) noisy directions. However, since \(p- d\) is fixed here, it only acts as a scaling factor for the SNR.
Available at this address https://archive.ics.uci.edu/ml/datasets/Statlog+(Landsat+Satellite).
References
Aggarwal, C.C., ChengXiang, Z.: A survey of text clustering algorithms. In: Mining Text Data, pp. 77–128. Springer (2012)
Agrawal, R., et al.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of data, pp. 94–105 (1998)
Airoldi, E.M., et al.: Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9(Sep), 1981–2014 (2008)
Baek, J., McLachlan, G.J., Flack, L.K.: Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data. IEEE Trans. Patt. Anal. Mach. Intell. 32(7), 1298–1309 (2009)
Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)
Baudry, J.-P., et al.: Combining mixture components for clustering. J. Comput. Graph. Stati. 9(2), 332–353 (2010)
Bellman, R.: Dynamic Programming. Princeton University Press, New Jersey (1957)
Bergé, L., Bouveyron, C., Girard, S.: HDclassif: An R Package for Model- Based Clustering and Discriminant Analysis of High-Dimensional Data. R Package Version, vol 2, ser 2, (2019)
Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Patt. Anal. Mach. Intell. 22(7), 719–725 (2000)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2006)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)
Bouveyron C., Camille, B.: On the estimation of the latent discriminative subspace in the Fisher-EM algorithm. In: (2011)
Bouveyron, C., et al.: Model-Based Clustering and Classification for Data Science: With Applications in R. Vol. 50. Cambridge University Press, Cambridge (2019)
Bouveyron, C., Brunet, C.: Simultaneous model-based clustering and visualization in the Fisher discriminative subspace. Stat. Comput. 22(1), 301–324 (2012)
Bouveyron, C., Brunet, C.: Theoretical and practical considerations on the convergence properties of the Fisher-EM algorithm. J. Multivariate Anal. 109, 29–41 (2012)
Bouveyron, C., Brunet-Saumard, C.: Discriminative variable selection for clustering with the sparse Fisher-EM algorithm. Comput. Stat. 29(3–4), 489–513 (2014)
Bouveyron, C., Brunet-Saumard, C.: Model-based clustering of high-dimensional data: A review. Comput. Stat. Data Anal. 71, 52–78 (2014)
Bouveyron, C., Girard, S., Schmid, C.: High-dimensional data clustering. Comput. Stat. Data Anal. 52(1), 502–519 (2007)
Bouveyron, C., Brunet, C., Jouvin, N.: FisherEM: The FisherEM algorithm to simultaneously cluster and visualize high- dimensional data. R Package Version 1(5), 2 (2020)
Buades, A., Coll, B., Morel, J.-M.: A review of image denoising algorithms, with a new one. Multiscale Model. Simul. 4(2), 490–530 (2005)
Buades, A., Coll, B., Morel, J.-M.: Non-local means denoising. Image Processing On Line 1, 208–212 (2011)
Celeux, G., Govaert, G.: A classification EM algorithm for clustering and two stochastic versions. Comput. Stat. Data Anal. 14(3), 315–332 (1992)
Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recogn. 28(5), 781–793 (1995)
Chang, W.-C.: On using principal components before separating a mixture of two multivariate normal distributions. J. Royal Stat. Soc.: Series C (Appl. Stat.) 32(3), 267–275 (1983)
Coleman, G.B., Andrews, H.C.: Image segmentation by clustering. Proc. IEEE 67(5), 773–785 (1979)
De la Torre, F., Takeo, K.: Discriminative cluster analysis. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 241-248 (2006)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc.: Ser. B (Methodological) 39(1), 1–22 (1977)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2000)
Dy, J.G., Carla, E.B.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5(Aug), 845–889 (2004)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(2), 179–188 (1936)
Foley, D.H., Sammon, J.W.: An optimal set of discriminant vectors. IEEE Trans. Comput. 100(3), 281–289 (1975)
Fruhwirth-Schnatter, S., Gilles, C., Christian, P.R.: Handbook of Mixture Analysis. Chapman and Hall/CRC, Boca Raton (2019)
Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press Professional, Inc, USA (1990)
Ge, R. et al.: Efficient algorithms for largescale generalized eigenvector computation and canonical correlation analysis. In: International Conference on Machine Learning, pp. 2741–2750 (2016)
Ghahramani, Z., Geoffrey, E.H.: The EM algorithm for mixtures of factor analyzers. Tech. rep. Technical Report CRG-TR-96-1, University of Toronto (1996)
Ghojogh, B., Fakhri, K., Mark, C.: Eigenvalue and Generalized Eigenvalue Problems: Tutorial. In: arXiv preprint arXiv:1903.11240 (2019)
Ghosh, D., Chinnaiyan, A.M.: Mixture modelling of gene expression data from microarray experiments. Bioinformatics 18(2), 275–286 (2002)
Giraud, C.: Introduction to High-dimensional Statistics, vol. 138. CRC Press, Baco Raton (2014)
Guo, Y.-F. et al.: A generalized Foley-Sammon transform based on generalized fisher discriminant criterion and its application to face recognition. In: Pattern Recognition Letters 24.1- 3, pp. 147-158 (2003)
Hamamoto, Y., et al.: A note on the orthonormal discriminant vector method for feature extraction. Pattern Recogn. 24(7), 681–684 (1991)
Houdard, A., Charles, B., Julie, D.: High-dimensional mixture models for unsupervised image denoising (HDMI). SIAM J. Imaging Sci. 11(4), 2815–2846 (2018)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Jaakkola, T.S., Jordan, M.I.: Bayesian parameter estimation via variational methods. Stat. Comput. 10(1), 25–37 (2000)
Jégou, H. et al.: Aggregating local descriptors into a compact image representation. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, pp. 3304–3311 (2010)
Kokiopoulou, E., Chen, J., Saad, Y.: Trace optimization and eigenproblems in dimension reduction methods. Numer. Linear Algebra Appl. 18(3), 565–602 (2011)
Lebrun, M., Buades, A., Morel, J.-M.: A nonlocal bayesian image denoising algorithm. SIAM J. Imaging Sci. 6(3), 1665–1688 (2013)
Lu, G.-F., Zou, J., Wang, Y.: A new and fast implementation of orthogonal LDA algorithm and its incremental extension. Neural Process. Lett. 43(3), 687–707 (2016)
McLachlan, G.J., David, P.: Finite Mixture Models. Wiley, New York (2004)
McLachlan, G.J., Thriyambakam, K.: The EM Algorithm and Extensions, vol. 382. Wiley, New York (2007)
McNicholas, P.D., et al.: pgmm: Parsimonious Gaussian mixture models. R Package Version 1(2), 4 (2019)
McNicholas, P.D., Murphy, T.B.: Parsimonious Gaussian mixture models. Stat. Comput. 18(3), 285–296 (2008)
Montanari, A., Viroli, C.: Heteroscedastic factor mixture analysis. Stat. Modell. 10(4), 441–460 (2010)
Morris, C.N.: Parametric empirical Bayes inference: theory and applications. J. Am. Stat. Assoc. 78(381), 47–55 (1983)
Nie, F., et al.: Trace ratio criterion for feature selection. AAAI 2, 671–676 (2008)
Okada, T., Tomita, S.: An optimal orthonormal system for discriminant analysis. Pattern Recogn. 18(2), 139–144 (1985)
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. Acm Sigkdd Explorations Newsletter 6(1), 90–105 (2004)
Qiao, Z., Lan, Z., Jianhua, Z.H.: Sparse linear discriminant analysis with applications to high dimensional low sample size data. Int. J. Appl. Math. 39(1) (2009). http://www.iaeng.org/IJAM/issues_v39/issue_1/index.html
Rathnayake, S., et al.: EMMIXmfa: Mixture models with component-wise factor analyzers. R Package Version 2, 11 (2019)
Scrucca, L.: Dimension reduction for modelbased clustering. Stat. Comput. 20(4), 471–484 (2010)
Scrucca, L., et al.: mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J. 8(1), 289 (2016)
Tipping, M.E., Bishop, C.M.: Mixtures of probabilistic principal component analyzers. Neural Comput. 11(2), 443–482 (1999)
von Weinen, M.D.Z.S.: Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25, 189–201 (1986)
Wang, H. et al.: Trace ratio vs. ratio trace for dimensionality reduction. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp. 1–8 (2007)
Wang, Y.-Q., Morel, J.-M.: SURE guided Gaussian mixture image denoising. SIAM J. Imaging Sci. 6(2), 999–1034 (2013)
Wei, X., Li, C.: Bayesian mixtures of common factor analyzers: Model, variational inference, and applications. Sig. Process. 93(11), 2894–2905 (2013)
Ye, J.: Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. J. Mach. Learn. Res. 6(Apr), 483–502 (2005)
Yoshida, R., Tomoyuki, H., Seiya, I.: A mixed factors model for dimension reduction and extraction of a group structure in gene expression data. In: Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004. IEEE, pp. 161–172 (2004)
Yu, G., Sapiro, G., Mallat, S.: Solving inverse problems with piecewise linear estimators: From Gaussian mixture models to structured sparsity. IEEE Trans. Image Process. 21(5), 2481–2499 (2011)
Zoran D., Yair W.: From learning models of natural image patches to whole image restoration. In: 2011 International Conference on Computer Vision. IEEE, pp. 479–486 (2011)
Acknowledgements
The authors wish to thank the anonymous referees for their thorough reading of the paper and their helpful comments. In addition, the authors thanks Antoine Houdard for the images and helpful discussions on image denoising. Finally, many thanks to Camille Noûs.
This work has benefited from the support of the French government, through the 3IA Cóte d’Azur Investment in the Future project managed by the National Research Agency (ANR) with the reference number ANR-19-P3IA-0002. This work was also supported by a DIM MathInnov Grant from Région Ile-de-France. The authors are finally thankful for the support from fédération F2PM, CNRS FR 2036, Paris.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Jouvin, N., Bouveyron, C. & Latouche, P. A Bayesian Fisher-EM algorithm for discriminative Gaussian subspace clustering. Stat Comput 31, 44 (2021). https://doi.org/10.1007/s11222-021-10018-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-021-10018-6