Skip to main content
Log in

Parallel computing in linear mixed models

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

In this study, we propose a parallel programming method for linear mixed models (LMM) generated from big data. A commonly used algorithm, expectation maximization (EM), is preferred for its use of maximum likelihood estimations, as the estimations are stable and simple. However, EM has a high computation cost. In our proposed method, we use a divide and recombine to split the data into smaller subsets, running the algorithm steps in parallel on multiple local cores and combining the results. The proposed method is used to fit LMM with dense and sparse parameters and for large number of observations. It is faster than the classical approach and generalizes for big data. Supplementary sources for the proposed method are available in the R package lmmpar.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Broderick T, Boyd N, Wibisono A, Wilson AC, Jordan MI (2013) Streaming variational Bayes. In proceedings of the 26th international conference on neural information processing systems—volume 2, NIPS’13. Curran Associates Inc, New York, pp 1727–1735

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  • Gokalp Yavuz F, Schloerke B (2017) Parallel Linear Mixed Model https://CRAN.R-project.org/package=lmmpar, R package version 0.1.0

  • Guo G (2012) Parallel statistical computing for statistical inference. J Stat Theory Pract 6(3):536–565

    Article  MathSciNet  Google Scholar 

  • Guo G, You W, Qian G, Shao W (2015) Parallel maximum likelihood estimator for multiple linear regression models. J Comput Appl Math 273:251–263

    Article  MathSciNet  Google Scholar 

  • Kane MJ, Emerson J, Weston S (2013) Scalable strategies for computing with massive data. J Stat Softw 55(14):1–19

    Article  Google Scholar 

  • Kontoghiorghes EJ (2005) Handbook of parallel computing and statistics (statistics, textbooks and monographs). Chapman & Hall/CRC, Boca Raton

    Book  Google Scholar 

  • Laird NM, Ware JH (1982) Random-effects models for longitudinal data. Biometrics 38(4):963–74

    Article  Google Scholar 

  • Liu C, Rubin DB (1994) The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence. Biometrika 81(4):633

    Article  MathSciNet  Google Scholar 

  • Maclaurin D, Adams RP (2014) Firefly monte carlo: exact MCMC with subsets of data. In Proceedings of the thirtieth conference on uncertainty in artificial intelligence, UAI’14. AUAI Press, Arlington, pp 543–552

  • Nagel K, Rickert M (2001) Parallel implementation of the transims micro-simulation. Parallel Comput 27:1611–1639

    Article  Google Scholar 

  • Neiswanger W, Wang C, Xing EP (2014) Asymptotically exact, embarrassingly parallel MCMC. In Proceedings of the thirtieth conference on uncertainty in artificial intelligence, UAI’14. AUAI Press, Arlington, pp 623–632

  • Ooi H, Microsoft Corporation, Weston S, Tenenbaum D (2019a) doParallel: foreach parallel adaptor for the ‘parallel’ package. R package version 1.0.15. https://cran.r-project.org/web/packages/doParallel/index.html

  • Ooi H, Microsoft Corporation, Weston S (2019b) Foreach: provides foreach looping construct. R package version 1.4.7.  https://cran.r-project.org/web/packages/foreach/index.html

  • Pinheiro JC, Liu C, Wu YN (2001) Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate t distribution. J Comput Graph Stat 10(2):249–276

    Article  MathSciNet  Google Scholar 

  • R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

  • Renaut RA (1998) A parallel multisplitting solution of the least squares problem. Numer Linear Algeb Appl 5(1):11–31

    Article  MathSciNet  Google Scholar 

  • Schafer JL (1998) Some improved procedures for linear mixed models. Technical Report, Department of Statistics, The Pennsylvania State University

  • Tran M-N, Nott DJ, Kuk AYC, Kohn R (2016) Parallel variational Bayes for large datasets with an application to generalized linear mixed models. J Comput Graph Stat 25(2):626–646

    Article  MathSciNet  Google Scholar 

  • Wickham H (2011) The split-apply-combine strategy for data analysis. J Stat Softw 40(1):1–29

    Article  MathSciNet  Google Scholar 

  • Wolfe J, Haghighi A, Klein D (2008) Fully distributed EM for very large datasets. In: Proceedings of the 25th international conference on machine learning, ICML ’08. ACM, New York, pp 1184–1191

  • Yavuz FG, Arslan O (2018) Linear mixed model with Laplace distribution (LLMM). Stat Pap 59(1):271–289

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fulya Gokalp Yavuz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gokalp Yavuz, F., Schloerke, B. Parallel computing in linear mixed models. Comput Stat 35, 1273–1289 (2020). https://doi.org/10.1007/s00180-019-00950-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-019-00950-7

Keywords

Navigation