Skip to main content
Log in

A Bayesian Fisher-EM algorithm for discriminative Gaussian subspace clustering

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

High-dimensional data clustering has become and remains a challenging task for modern statistics and machine learning, with a wide range of applications. We consider in this work the powerful discriminative latent mixture model, and we extend it to the Bayesian framework. Modeling data as a mixture of Gaussians in a low-dimensional discriminative subspace, a Gaussian prior distribution is introduced over the latent group means and a family of twelve submodels are derived considering different covariance structures. Model inference is done with a variational EM algorithm, while the discriminative subspace is estimated via a Fisher-step maximizing an unsupervised Fisher criterion. An empirical Bayes procedure is proposed for the estimation of the prior hyper-parameters, and an integrated classification likelihood criterion is derived for selecting both the number of clusters and the submodel. The performances of the resulting Bayesian Fisher-EM algorithm are investigated in two thorough simulated scenarios, regarding both dimensionality as well as noise and assessing its superiority with respect to state-of-the-art Gaussian subspace clustering models. In addition to standard real data benchmarks, an application to single image denoising is proposed, displaying relevant results. This work comes with a reference implementation for the software in the package accompanying the paper and available on CRAN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. The necessary source code for reproducible experiments is available and detailed in the CRAN package. More information is available at https://github.com/nicolasJouvin/FisherEM.

  2. One could use \((p-d) \beta \) as the actual variance of the signal, taking into account the fact that there are \((p-d)\) noisy directions. However, since \(p- d\) is fixed here, it only acts as a scaling factor for the SNR.

  3. Available at this address https://archive.ics.uci.edu/ml/datasets/Statlog+(Landsat+Satellite).

References

  • Aggarwal, C.C., ChengXiang, Z.: A survey of text clustering algorithms. In: Mining Text Data, pp. 77–128. Springer (2012)

  • Agrawal, R., et al.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of data, pp. 94–105 (1998)

  • Airoldi, E.M., et al.: Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9(Sep), 1981–2014 (2008)

    MATH  Google Scholar 

  • Baek, J., McLachlan, G.J., Flack, L.K.: Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data. IEEE Trans. Patt. Anal. Mach. Intell. 32(7), 1298–1309 (2009)

    Article  Google Scholar 

  • Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)

  • Baudry, J.-P., et al.: Combining mixture components for clustering. J. Comput. Graph. Stati. 9(2), 332–353 (2010)

    Article  MathSciNet  Google Scholar 

  • Bellman, R.: Dynamic Programming. Princeton University Press, New Jersey (1957)

    MATH  Google Scholar 

  • Bergé, L., Bouveyron, C., Girard, S.: HDclassif: An R Package for Model- Based Clustering and Discriminant Analysis of High-Dimensional Data. R Package Version, vol 2, ser 2, (2019)

  • Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Patt. Anal. Mach. Intell. 22(7), 719–725 (2000)

    Article  Google Scholar 

  • Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2006)

    MATH  Google Scholar 

  • Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  • Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)

    Article  MathSciNet  Google Scholar 

  • Bouveyron C., Camille, B.: On the estimation of the latent discriminative subspace in the Fisher-EM algorithm. In: (2011)

  • Bouveyron, C., et al.: Model-Based Clustering and Classification for Data Science: With Applications in R. Vol. 50. Cambridge University Press, Cambridge (2019)

    Book  Google Scholar 

  • Bouveyron, C., Brunet, C.: Simultaneous model-based clustering and visualization in the Fisher discriminative subspace. Stat. Comput. 22(1), 301–324 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Bouveyron, C., Brunet, C.: Theoretical and practical considerations on the convergence properties of the Fisher-EM algorithm. J. Multivariate Anal. 109, 29–41 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Bouveyron, C., Brunet-Saumard, C.: Discriminative variable selection for clustering with the sparse Fisher-EM algorithm. Comput. Stat. 29(3–4), 489–513 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Bouveyron, C., Brunet-Saumard, C.: Model-based clustering of high-dimensional data: A review. Comput. Stat. Data Anal. 71, 52–78 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Bouveyron, C., Girard, S., Schmid, C.: High-dimensional data clustering. Comput. Stat. Data Anal. 52(1), 502–519 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Bouveyron, C., Brunet, C., Jouvin, N.: FisherEM: The FisherEM algorithm to simultaneously cluster and visualize high- dimensional data. R Package Version 1(5), 2 (2020)

    Google Scholar 

  • Buades, A., Coll, B., Morel, J.-M.: A review of image denoising algorithms, with a new one. Multiscale Model. Simul. 4(2), 490–530 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Buades, A., Coll, B., Morel, J.-M.: Non-local means denoising. Image Processing On Line 1, 208–212 (2011)

    Article  MATH  Google Scholar 

  • Celeux, G., Govaert, G.: A classification EM algorithm for clustering and two stochastic versions. Comput. Stat. Data Anal. 14(3), 315–332 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  • Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recogn. 28(5), 781–793 (1995)

    Article  Google Scholar 

  • Chang, W.-C.: On using principal components before separating a mixture of two multivariate normal distributions. J. Royal Stat. Soc.: Series C (Appl. Stat.) 32(3), 267–275 (1983)

    MathSciNet  MATH  Google Scholar 

  • Coleman, G.B., Andrews, H.C.: Image segmentation by clustering. Proc. IEEE 67(5), 773–785 (1979)

    Article  Google Scholar 

  • De la Torre, F., Takeo, K.: Discriminative cluster analysis. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 241-248 (2006)

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc.: Ser. B (Methodological) 39(1), 1–22 (1977)

    MathSciNet  MATH  Google Scholar 

  • Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2000)

    MATH  Google Scholar 

  • Dy, J.G., Carla, E.B.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5(Aug), 845–889 (2004)

    MathSciNet  MATH  Google Scholar 

  • Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(2), 179–188 (1936)

    Article  Google Scholar 

  • Foley, D.H., Sammon, J.W.: An optimal set of discriminant vectors. IEEE Trans. Comput. 100(3), 281–289 (1975)

    Article  MATH  Google Scholar 

  • Fruhwirth-Schnatter, S., Gilles, C., Christian, P.R.: Handbook of Mixture Analysis. Chapman and Hall/CRC, Boca Raton (2019)

    Book  MATH  Google Scholar 

  • Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press Professional, Inc, USA (1990)

    MATH  Google Scholar 

  • Ge, R. et al.: Efficient algorithms for largescale generalized eigenvector computation and canonical correlation analysis. In: International Conference on Machine Learning, pp. 2741–2750 (2016)

  • Ghahramani, Z., Geoffrey, E.H.: The EM algorithm for mixtures of factor analyzers. Tech. rep. Technical Report CRG-TR-96-1, University of Toronto (1996)

  • Ghojogh, B., Fakhri, K., Mark, C.: Eigenvalue and Generalized Eigenvalue Problems: Tutorial. In: arXiv preprint arXiv:1903.11240 (2019)

  • Ghosh, D., Chinnaiyan, A.M.: Mixture modelling of gene expression data from microarray experiments. Bioinformatics 18(2), 275–286 (2002)

    Article  Google Scholar 

  • Giraud, C.: Introduction to High-dimensional Statistics, vol. 138. CRC Press, Baco Raton (2014)

    Book  MATH  Google Scholar 

  • Guo, Y.-F. et al.: A generalized Foley-Sammon transform based on generalized fisher discriminant criterion and its application to face recognition. In: Pattern Recognition Letters 24.1- 3, pp. 147-158 (2003)

  • Hamamoto, Y., et al.: A note on the orthonormal discriminant vector method for feature extraction. Pattern Recogn. 24(7), 681–684 (1991)

    Article  Google Scholar 

  • Houdard, A., Charles, B., Julie, D.: High-dimensional mixture models for unsupervised image denoising (HDMI). SIAM J. Imaging Sci. 11(4), 2815–2846 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  • Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

    Article  MATH  Google Scholar 

  • Jaakkola, T.S., Jordan, M.I.: Bayesian parameter estimation via variational methods. Stat. Comput. 10(1), 25–37 (2000)

    Article  Google Scholar 

  • Jégou, H. et al.: Aggregating local descriptors into a compact image representation. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, pp. 3304–3311 (2010)

  • Kokiopoulou, E., Chen, J., Saad, Y.: Trace optimization and eigenproblems in dimension reduction methods. Numer. Linear Algebra Appl. 18(3), 565–602 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Lebrun, M., Buades, A., Morel, J.-M.: A nonlocal bayesian image denoising algorithm. SIAM J. Imaging Sci. 6(3), 1665–1688 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Lu, G.-F., Zou, J., Wang, Y.: A new and fast implementation of orthogonal LDA algorithm and its incremental extension. Neural Process. Lett. 43(3), 687–707 (2016)

    Article  Google Scholar 

  • McLachlan, G.J., David, P.: Finite Mixture Models. Wiley, New York (2004)

    MATH  Google Scholar 

  • McLachlan, G.J., Thriyambakam, K.: The EM Algorithm and Extensions, vol. 382. Wiley, New York (2007)

    MATH  Google Scholar 

  • McNicholas, P.D., et al.: pgmm: Parsimonious Gaussian mixture models. R Package Version 1(2), 4 (2019)

    Google Scholar 

  • McNicholas, P.D., Murphy, T.B.: Parsimonious Gaussian mixture models. Stat. Comput. 18(3), 285–296 (2008)

    Article  MathSciNet  Google Scholar 

  • Montanari, A., Viroli, C.: Heteroscedastic factor mixture analysis. Stat. Modell. 10(4), 441–460 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Morris, C.N.: Parametric empirical Bayes inference: theory and applications. J. Am. Stat. Assoc. 78(381), 47–55 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  • Nie, F., et al.: Trace ratio criterion for feature selection. AAAI 2, 671–676 (2008)

    Google Scholar 

  • Okada, T., Tomita, S.: An optimal orthonormal system for discriminant analysis. Pattern Recogn. 18(2), 139–144 (1985)

    Article  Google Scholar 

  • Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. Acm Sigkdd Explorations Newsletter 6(1), 90–105 (2004)

    Article  Google Scholar 

  • Qiao, Z., Lan, Z., Jianhua, Z.H.: Sparse linear discriminant analysis with applications to high dimensional low sample size data. Int. J. Appl. Math. 39(1) (2009). http://www.iaeng.org/IJAM/issues_v39/issue_1/index.html

  • Rathnayake, S., et al.: EMMIXmfa: Mixture models with component-wise factor analyzers. R Package Version 2, 11 (2019)

    Google Scholar 

  • Scrucca, L.: Dimension reduction for modelbased clustering. Stat. Comput. 20(4), 471–484 (2010)

    Article  MathSciNet  Google Scholar 

  • Scrucca, L., et al.: mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J. 8(1), 289 (2016)

    Article  Google Scholar 

  • Tipping, M.E., Bishop, C.M.: Mixtures of probabilistic principal component analyzers. Neural Comput. 11(2), 443–482 (1999)

    Article  Google Scholar 

  • von Weinen, M.D.Z.S.: Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25, 189–201 (1986)

    Google Scholar 

  • Wang, H. et al.: Trace ratio vs. ratio trace for dimensionality reduction. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp. 1–8 (2007)

  • Wang, Y.-Q., Morel, J.-M.: SURE guided Gaussian mixture image denoising. SIAM J. Imaging Sci. 6(2), 999–1034 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Wei, X., Li, C.: Bayesian mixtures of common factor analyzers: Model, variational inference, and applications. Sig. Process. 93(11), 2894–2905 (2013)

    Article  Google Scholar 

  • Ye, J.: Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. J. Mach. Learn. Res. 6(Apr), 483–502 (2005)

    MathSciNet  MATH  Google Scholar 

  • Yoshida, R., Tomoyuki, H., Seiya, I.: A mixed factors model for dimension reduction and extraction of a group structure in gene expression data. In: Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004. IEEE, pp. 161–172 (2004)

  • Yu, G., Sapiro, G., Mallat, S.: Solving inverse problems with piecewise linear estimators: From Gaussian mixture models to structured sparsity. IEEE Trans. Image Process. 21(5), 2481–2499 (2011)

    MathSciNet  MATH  Google Scholar 

  • Zoran D., Yair W.: From learning models of natural image patches to whole image restoration. In: 2011 International Conference on Computer Vision. IEEE, pp. 479–486 (2011)

Download references

Acknowledgements

The authors wish to thank the anonymous referees for their thorough reading of the paper and their helpful comments. In addition, the authors thanks Antoine Houdard for the images and helpful discussions on image denoising. Finally, many thanks to Camille Noûs.

This work has benefited from the support of the French government, through the 3IA Cóte d’Azur Investment in the Future project managed by the National Research Agency (ANR) with the reference number ANR-19-P3IA-0002. This work was also supported by a DIM MathInnov Grant from Région Ile-de-France. The authors are finally thankful for the support from fédération F2PM, CNRS FR 2036, Paris.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicolas Jouvin.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 366 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jouvin, N., Bouveyron, C. & Latouche, P. A Bayesian Fisher-EM algorithm for discriminative Gaussian subspace clustering. Stat Comput 31, 44 (2021). https://doi.org/10.1007/s11222-021-10018-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-021-10018-6

Keywords

Navigation