Mini-batch learning of exponential family finite mixture models

Nguyen, Hien D.; Forbes, Florence; McLachlan, Geoffrey J.

doi:10.1007/s11222-019-09919-4

Mini-batch learning of exponential family finite mixture models

Published: 10 January 2020

Volume 30, pages 731–748, (2020)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Hien D. Nguyen ORCID: orcid.org/0000-0002-9958-432X¹,
Florence Forbes^2,3 &
Geoffrey J. McLachlan⁴

714 Accesses
16 Citations
9 Altmetric
Explore all metrics

Abstract

Mini-batch algorithms have become increasingly popular due to the requirement for solving optimization problems, based on large-scale data sets. Using an existing online expectation–maximization (EM) algorithm framework, we demonstrate how mini-batch (MB) algorithms may be constructed, and propose a scheme for the stochastic stabilization of the constructed mini-batch algorithms. Theoretical results regarding the convergence of the mini-batch EM algorithms are presented. We then demonstrate how the mini-batch framework may be applied to conduct maximum likelihood (ML) estimation of mixtures of exponential family distributions, with emphasis on ML estimation for mixtures of normal distributions. Via a simulation study, we demonstrate that the mini-batch algorithm for mixtures of normal distributions can outperform the standard EM algorithm. Further evidence of the performance of the mini-batch framework is provided via an application to the famous MNIST data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Properties of the stochastic approximation EM algorithm with mini-batch sampling

Article 05 September 2020

Batch and Online Mixture Learning: A Review with Extensions

An alternative to EM for Gaussian mixture models: batch and stochastic Riemannian optimization

Article 19 March 2019

References

Amari, S.: Information Geometry and Its Applications. Springer, Japan (2016)
MATH Google Scholar
Bouveyron, C., Girard, S., Schmid, C.: High-dimensional data clustering. Comput. Stat. Data Anal. 52, 502–519 (2007)
MathSciNet MATH Google Scholar
Buhlmann, P., Drineas, P., Kane, M., van der Laan, M. (eds.): Handbook of Big Data. CRC Press, Boca Raton (2016)
MATH Google Scholar
Cappé, O., Moulines, E.: On-line expectation–maximization algorithm for latent data models. J. R. Stat. Soc. B 71, 593–613 (2009)
MathSciNet MATH Google Scholar
Celeux, G., Chretien, S., Forbes, F., Mkhadri, A.: A component-wise EM algorithm for mixtures. J. Comput. Graph. Stat. 10, 697–712 (2001)
MathSciNet Google Scholar
Chau, M., Fu, M.C.: An overview of stochastic approximation. In: Fu, M.C. (ed.) Handbook of Simulation Optimization, pp. 149–178. Springer, New York (2015)
Google Scholar
Chen, H.-F.: Stochastic Approximiation and Its Applications. Kluwer, New York (2003)
Google Scholar
Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better mini-batch algorithms via accelerated gradient methods. In: Advances in Neural Information Processing Systems, pp. 1647–1655 (2011)
DasGupta, A.: Probability for Statistics and Machine Learning. Springer, New York (2011)
MATH Google Scholar
Delyon, B., Lavielle, M., Moulines, E.: Counvergence of a stochastic approximation version of the EM algorithm. Ann. Stat. 27, 94–128 (1999)
MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39, 1–38 (1977)
MathSciNet MATH Google Scholar
Eddelbuettel, D.: Seamless R and C++ Integration with Rcpp. Springer, New York (2013)
MATH Google Scholar
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)
Google Scholar
Forbes, C., Evans, M., Hastings, N., Peacock, B.: Statistical Distributions. Wiley, New York (2011)
MATH Google Scholar
Fraley, C., Raftery, A., Wehrens, R.: Incremental model-based clustering for large datasets with small clusters. J. Comput. Graph. Stat. 14, 529–546 (2005)
Google Scholar
Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program. Ser. A 155, 267–305 (2016)
MathSciNet MATH Google Scholar
Han, Z., Hong, M., Wang, D.: Signal Processing and Networking for Big Data Applications. Cambridge University Press, Cambridge (2017)
MATH Google Scholar
Hardle, W.K., Lu, H.H.-S., Shen, X. (eds.): Handbook of Big Data Analytics. Springer, Cham (2018)
MATH Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A k-means clustering algorithm. J. R. Stat. Soc. Ser. C 28, 100–108 (1979)
MATH Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
MATH Google Scholar
Iverson, K.E.: A Programming Language. Wiley, New York (1967)
MATH Google Scholar
Jolliffe, I.T.: Principal Component Analysis. Springer, New York (2002)
MATH Google Scholar
Jones, P.N., McLachlan, G.J.: Fitting finite mixture models in a regression context. Aust. J. Stat. 34, 233–240 (1992)
Google Scholar
Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23, 462–466 (1952)
MathSciNet MATH Google Scholar
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)
MathSciNet MATH Google Scholar
Kushner, H.J., Yin, G.G.: Stochastic Approximiation and Recursive Algorithms and Applications. Springer, New York (2003)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Google Scholar
Li, M., Zhang, T., Chen, Y., Smola, A.J.: Efficient mini-batch training for stochastic optimization. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 661–670) (2014)
Liang, F., Zhang, J.: Estimating the false discovery rate using the stochastic approximation algorithm. Biometrika 95, 961–977 (2008)
MathSciNet MATH Google Scholar
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley, New York (2008)
MATH Google Scholar
McLachlan, G.J., Lee, S.X., Rathnayake, S.I.: Finite mixture models. Ann. Rev. Stat. Appl. 6, 355–378 (2019)
MathSciNet Google Scholar
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)
MATH Google Scholar
Melnykov, V., Chen, W.-C., Maitra, R.: MixSim: an R package for simulating data to study performance of clustering algorithms. J. Stat. Softw. 51, 1–25 (2012)
Google Scholar
Ng, S.-K., McLachlan, G.J.: Speeding up the EM algorithm for mixture model-based segmentation of magnetic resonance images. Pattern Recognit. 37, 1573–1589 (2004)
MATH Google Scholar
Nguyen, H.D., Chamroukhi, F.: Practical and theoretical aspects of mixture-of-experts modeling: an overview. WIREs Data Min. Knowl. Discov. 8(4), e1246 (2018)
Google Scholar
Nguyen, H.D., Jones, A.T.: Big Data-appropriate clustering via stochastic approximation and Gaussian mixture models. In: Ahmed, M., Pathan, A.-S.K. (eds.) Data Analytics: Concepts, Techniques, and Applications. CRC Press, Boca Raton (2018)
Google Scholar
Nguyen, H.D., McLachlan, G.J.: Maximum likelihood estimation of Gaussian mixture models without matrix operations. Adv. Data Anal. Classif. 9, 371–394 (2015)
MathSciNet MATH Google Scholar
Pearson, K.: Contributions to the theory of mathematical evolution. Philos. Trans. R. Soc. Lond. A 185, 71–110 (1894)
MATH Google Scholar
Polyak, B.T.: A new method of stochastic approximation type. Autom. Remote Control 51, 98–107 (1990)
MathSciNet Google Scholar
Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30, 838–855 (1992)
MathSciNet MATH Google Scholar
Prosperetti, A.: Advanced Mathematics for Applications. Cambridge University Press, Cambridge (2011)
MATH Google Scholar
R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing (2018)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
MathSciNet MATH Google Scholar
Schubert, E., Koos, A., Emrich, T., Zufle, A., Schmid, K.A., Zimek, A.: A framework for clustering uncertain data. Proc. VLDB Endow. 8, 1976–1979 (2015)
Google Scholar
Scrucca, L., Fop, M., Murphy, T.B., Raftery, A.E.: mclust: clustering, classification and density estimation using Gaussian finite mixture models. R J. 8, 289–317 (2016)
Google Scholar
Vlassis, N., Likas, A.: A greedy EM algorithm for Gaussian mixture learning. Neural Process. Lett. 15, 77–87 (2002)
MATH Google Scholar
White, H.: Maximum likelihood estimation of misspecified models. Econometrica 50, 1–25 (1982)
MathSciNet MATH Google Scholar
White, H.: Asymptotic Theory For Econometricians. Academic Press, San Diego (2001)
Google Scholar
Wickham, H., Cook, D., Hofmann, H., Buja, A.: tourr: an R package for exploring multivariate data with projections. J. Stat. Softw. 40, 1–18 (2011)
Google Scholar
Wu, C.F.J.: On the convergence properties of the EM algorithm. Ann. Stat. 11, 95–103 (1983)
MathSciNet MATH Google Scholar
Xu, L., Jordan, M.I., Hinton, G.E.: An alternative model for mixtures of experts. In: Advances in Neural Information Processing Systems, pp. 633–640 (1995)
Zhang, J., Liang, F.: Convergence of stochastic approximation algorithms under irregular conditions. Stat. Neerl. 62, 393–403 (2008)
MathSciNet MATH Google Scholar
Zhao, T., Yu, M., Wang, Y., Arora, R., Liu, H.: Accelerated mini-batch randomized block coordinate descent method. In Advances in Neural Information Processing Systems (pp. 3329–3337) (2014)

Download references

Acknowledgements

The authors are indebted to the Co-ordinating Editor and two Reviewers for their insightful comments that have improved the exposition of the manuscript. HDN is personally funded by Australian Research Council (ARC) Grant DE170101134. GJM and HDN are also funded under ARC Grant DP180101192. The work is supported by Inria project LANDER.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, La Trobe University, Melbourne, VIC, Australia
Hien D. Nguyen
Inria, CNRS, Grenoble INP, LJK, Univ. Grenoble Alpes, 38000, Grenoble, France
Florence Forbes
Institute of Engineering Univ. Grenoble Alpes, Grenoble, France
Florence Forbes
School of Mathematics and Physics, University of Queensland, St. Lucia, Brisbane, Australia
Geoffrey J. McLachlan

Authors

Hien D. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Florence Forbes
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey J. McLachlan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hien D. Nguyen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1745 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, H.D., Forbes, F. & McLachlan, G.J. Mini-batch learning of exponential family finite mixture models. Stat Comput 30, 731–748 (2020). https://doi.org/10.1007/s11222-019-09919-4

Download citation

Received: 09 February 2019
Accepted: 27 December 2019
Published: 10 January 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s11222-019-09919-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mini-batch learning of exponential family finite mixture models

Abstract

Access this article

Similar content being viewed by others

Properties of the stochastic approximation EM algorithm with mini-batch sampling

Batch and Online Mixture Learning: A Review with Extensions

An alternative to EM for Gaussian mixture models: batch and stochastic Riemannian optimization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 1745 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mini-batch learning of exponential family finite mixture models

Abstract

Access this article

Similar content being viewed by others

Properties of the stochastic approximation EM algorithm with mini-batch sampling

Batch and Online Mixture Learning: A Review with Extensions

An alternative to EM for Gaussian mixture models: batch and stochastic Riemannian optimization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 1745 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation