Abstract
Mini-batch algorithms have become increasingly popular due to the requirement for solving optimization problems, based on large-scale data sets. Using an existing online expectation–maximization (EM) algorithm framework, we demonstrate how mini-batch (MB) algorithms may be constructed, and propose a scheme for the stochastic stabilization of the constructed mini-batch algorithms. Theoretical results regarding the convergence of the mini-batch EM algorithms are presented. We then demonstrate how the mini-batch framework may be applied to conduct maximum likelihood (ML) estimation of mixtures of exponential family distributions, with emphasis on ML estimation for mixtures of normal distributions. Via a simulation study, we demonstrate that the mini-batch algorithm for mixtures of normal distributions can outperform the standard EM algorithm. Further evidence of the performance of the mini-batch framework is provided via an application to the famous MNIST data set.
Similar content being viewed by others
References
Amari, S.: Information Geometry and Its Applications. Springer, Japan (2016)
Bouveyron, C., Girard, S., Schmid, C.: High-dimensional data clustering. Comput. Stat. Data Anal. 52, 502–519 (2007)
Buhlmann, P., Drineas, P., Kane, M., van der Laan, M. (eds.): Handbook of Big Data. CRC Press, Boca Raton (2016)
Cappé, O., Moulines, E.: On-line expectation–maximization algorithm for latent data models. J. R. Stat. Soc. B 71, 593–613 (2009)
Celeux, G., Chretien, S., Forbes, F., Mkhadri, A.: A component-wise EM algorithm for mixtures. J. Comput. Graph. Stat. 10, 697–712 (2001)
Chau, M., Fu, M.C.: An overview of stochastic approximation. In: Fu, M.C. (ed.) Handbook of Simulation Optimization, pp. 149–178. Springer, New York (2015)
Chen, H.-F.: Stochastic Approximiation and Its Applications. Kluwer, New York (2003)
Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better mini-batch algorithms via accelerated gradient methods. In: Advances in Neural Information Processing Systems, pp. 1647–1655 (2011)
DasGupta, A.: Probability for Statistics and Machine Learning. Springer, New York (2011)
Delyon, B., Lavielle, M., Moulines, E.: Counvergence of a stochastic approximation version of the EM algorithm. Ann. Stat. 27, 94–128 (1999)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39, 1–38 (1977)
Eddelbuettel, D.: Seamless R and C++ Integration with Rcpp. Springer, New York (2013)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)
Forbes, C., Evans, M., Hastings, N., Peacock, B.: Statistical Distributions. Wiley, New York (2011)
Fraley, C., Raftery, A., Wehrens, R.: Incremental model-based clustering for large datasets with small clusters. J. Comput. Graph. Stat. 14, 529–546 (2005)
Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program. Ser. A 155, 267–305 (2016)
Han, Z., Hong, M., Wang, D.: Signal Processing and Networking for Big Data Applications. Cambridge University Press, Cambridge (2017)
Hardle, W.K., Lu, H.H.-S., Shen, X. (eds.): Handbook of Big Data Analytics. Springer, Cham (2018)
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A k-means clustering algorithm. J. R. Stat. Soc. Ser. C 28, 100–108 (1979)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Iverson, K.E.: A Programming Language. Wiley, New York (1967)
Jolliffe, I.T.: Principal Component Analysis. Springer, New York (2002)
Jones, P.N., McLachlan, G.J.: Fitting finite mixture models in a regression context. Aust. J. Stat. 34, 233–240 (1992)
Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23, 462–466 (1952)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)
Kushner, H.J., Yin, G.G.: Stochastic Approximiation and Recursive Algorithms and Applications. Springer, New York (2003)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Li, M., Zhang, T., Chen, Y., Smola, A.J.: Efficient mini-batch training for stochastic optimization. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 661–670) (2014)
Liang, F., Zhang, J.: Estimating the false discovery rate using the stochastic approximation algorithm. Biometrika 95, 961–977 (2008)
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley, New York (2008)
McLachlan, G.J., Lee, S.X., Rathnayake, S.I.: Finite mixture models. Ann. Rev. Stat. Appl. 6, 355–378 (2019)
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)
Melnykov, V., Chen, W.-C., Maitra, R.: MixSim: an R package for simulating data to study performance of clustering algorithms. J. Stat. Softw. 51, 1–25 (2012)
Ng, S.-K., McLachlan, G.J.: Speeding up the EM algorithm for mixture model-based segmentation of magnetic resonance images. Pattern Recognit. 37, 1573–1589 (2004)
Nguyen, H.D., Chamroukhi, F.: Practical and theoretical aspects of mixture-of-experts modeling: an overview. WIREs Data Min. Knowl. Discov. 8(4), e1246 (2018)
Nguyen, H.D., Jones, A.T.: Big Data-appropriate clustering via stochastic approximation and Gaussian mixture models. In: Ahmed, M., Pathan, A.-S.K. (eds.) Data Analytics: Concepts, Techniques, and Applications. CRC Press, Boca Raton (2018)
Nguyen, H.D., McLachlan, G.J.: Maximum likelihood estimation of Gaussian mixture models without matrix operations. Adv. Data Anal. Classif. 9, 371–394 (2015)
Pearson, K.: Contributions to the theory of mathematical evolution. Philos. Trans. R. Soc. Lond. A 185, 71–110 (1894)
Polyak, B.T.: A new method of stochastic approximation type. Autom. Remote Control 51, 98–107 (1990)
Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30, 838–855 (1992)
Prosperetti, A.: Advanced Mathematics for Applications. Cambridge University Press, Cambridge (2011)
R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing (2018)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Schubert, E., Koos, A., Emrich, T., Zufle, A., Schmid, K.A., Zimek, A.: A framework for clustering uncertain data. Proc. VLDB Endow. 8, 1976–1979 (2015)
Scrucca, L., Fop, M., Murphy, T.B., Raftery, A.E.: mclust: clustering, classification and density estimation using Gaussian finite mixture models. R J. 8, 289–317 (2016)
Vlassis, N., Likas, A.: A greedy EM algorithm for Gaussian mixture learning. Neural Process. Lett. 15, 77–87 (2002)
White, H.: Maximum likelihood estimation of misspecified models. Econometrica 50, 1–25 (1982)
White, H.: Asymptotic Theory For Econometricians. Academic Press, San Diego (2001)
Wickham, H., Cook, D., Hofmann, H., Buja, A.: tourr: an R package for exploring multivariate data with projections. J. Stat. Softw. 40, 1–18 (2011)
Wu, C.F.J.: On the convergence properties of the EM algorithm. Ann. Stat. 11, 95–103 (1983)
Xu, L., Jordan, M.I., Hinton, G.E.: An alternative model for mixtures of experts. In: Advances in Neural Information Processing Systems, pp. 633–640 (1995)
Zhang, J., Liang, F.: Convergence of stochastic approximation algorithms under irregular conditions. Stat. Neerl. 62, 393–403 (2008)
Zhao, T., Yu, M., Wang, Y., Arora, R., Liu, H.: Accelerated mini-batch randomized block coordinate descent method. In Advances in Neural Information Processing Systems (pp. 3329–3337) (2014)
Acknowledgements
The authors are indebted to the Co-ordinating Editor and two Reviewers for their insightful comments that have improved the exposition of the manuscript. HDN is personally funded by Australian Research Council (ARC) Grant DE170101134. GJM and HDN are also funded under ARC Grant DP180101192. The work is supported by Inria project LANDER.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Nguyen, H.D., Forbes, F. & McLachlan, G.J. Mini-batch learning of exponential family finite mixture models. Stat Comput 30, 731–748 (2020). https://doi.org/10.1007/s11222-019-09919-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-019-09919-4