Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter January 13, 2021

Gradient boosting for linear mixed models

  • Colin Griesbach EMAIL logo , Benjamin Säfken and Elisabeth Waldmann

Abstract

Gradient boosting from the field of statistical learning is widely known as a powerful framework for estimation and selection of predictor effects in various regression models by adapting concepts from classification theory. Current boosting approaches also offer methods accounting for random effects and thus enable prediction of mixed models for longitudinal and clustered data. However, these approaches include several flaws resulting in unbalanced effect selection with falsely induced shrinkage and a low convergence rate on the one hand and biased estimates of the random effects on the other hand. We therefore propose a new boosting algorithm which explicitly accounts for the random structure by excluding it from the selection procedure, properly correcting the random effects estimates and in addition providing likelihood-based estimation of the random effects variance structure. The new algorithm offers an organic and unbiased fitting approach, which is shown via simulations and data examples.


Corresponding author: Colin Griesbach, Department of Medical Informatics, Biometry and Epidemiology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany, E-mail:

Funding source: DFG

Award Identifier / Grant number: Projekt WA 4249/2-1

Funding source: Volkswagen Foundation

Award Identifier / Grant number: Freigeist Fellowship

Acknowledgement

Colin Griesbach performed the present work in partial fulfilment of the requirements for obtaining the degree ‘Dr. rer. biol. hum.’ at the Friedrich-Alexander-Universität Erlangen-Nürnberg.

  1. Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

  2. Research funding: This paper was funded by DFG (Projekt WA 4249/2-1) and Volkswagen Foundation (Freigeist Fellowship).

  3. Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

Appendix

1 Formulating the correction matrix C

Due to the updating procedure, random effects estimates γ ˆ need to be corrected in order to ensure uncorrelated estimates with any other given covariates and thus also unbiased coefficient estimates β ˆ for the fixed effects. At first, sets of covariates X cs , s = 1, … , q, have to be specified for each random effect which has to be corrected. For random intercepts X cs will include a column of ones as well as one representative of every cluster-constant covariate. For random slopes X cs will just contain a column of ones (which simplifies to centering the corresponding random effect) or additional cluster-constant covariates, if interaction effects are included for the covariate, the given random slope is specified for. The single correction matrices can then be computed by

C s = X c s ( X c s T X c s ) 1 X c s , s = 1 , , q

and one obtains the block diagonal C ˜ = diag ( C 1 , , C q ) . The final correction matrix C is then obtained with

C = P 1 ( I n q C ˜ ) P ,

where P is a permutation matrix mapping γ to

P γ = γ ˜ = ( γ ˜ 1 , , γ ˜ q )

with γ ˜ s = ( γ s 1 , , γ s n ) . The product corrects each random effect s for any covariates contained in the corresponding matrix X cs by counting out the orthogonal projections of the sth random effect estimates on the subspace generated by the covariates X cs . This ensures the coefficient estimate for the random effects to be uncorrelated with any observed covariate.

2 Computational effort

Table 6 depicts the elapsed computation time of each simulation run from Section 2. The computational effort for mboost scales pretty consistently with the number of candidate variables p leading to a slightly faster runtime in the p = 10 case but higher effort for increasing dimensions. For grbLMM, stopping based on AIC leads to a longer runtime since the computation of each iteration’s boosting hat matrix is computationally more intensive.

Table 6:

Elapsed computation time averaged over 100 simulation runs for each scenario. t int denotes the runtime for random intercept setups and t slp for additional random slopes.

τ p lme4 mboost grbLMM a grbLMM b
t int t slp t int t slp t int t slp t int t slp
0.4 10 0.15 0.30 224 327 433 505 1905 1936
0.4 25 0.17 0.40 501 602 446 517 1908 1939
0.4 50 0.22 0.72 954 1053 466 538 1912 1945
0.4 100 0.35 1.60 1868 1970 506 576 1922 1954
0.4 500 9505 9602 819 892 2344 2391
0.8 10 0.14 0.33 224 327 434 505 1904 1937
0.8 25 0.17 0.46 502 603 448 518 1908 1939
0.8 50 0.23 0.84 955 1053 467 537 1913 1944
0.8 100 0.37 1.88 1869 1972 507 576 1921 1955
0.8 500 9484 9480 820 890 2337 2380
1.6 10 0.15 0.49 223 327 434 502 1904 1936
1.6 25 0.18 0.73 502 603 447 517 1908 1939
1.6 50 0.25 1.33 956 1054 470 535 1912 1945
1.6 100 0.46 3.02 1869 1974 506 577 1922 1954
1.6 500 9464 9572 822 891 2358 2379

References

1. Laird, NM, Ware, JH. Random-effects models for longitudinal data. Biometrics 1982;38:963–74. https://doi.org/10.2307/2529876.Search in Google Scholar

2. Anderssen, R, Bloomfield, P. A time series approach to numerical differentiation. Technometrics 1974;16:69–75. https://doi.org/10.1080/00401706.1974.10489151.Search in Google Scholar

3. Wahba, G. A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem. Ann Stat 1985;13:1378–402. doi:https://doi.org/10.1214/aos/1176349743.Search in Google Scholar

4. Wood, S. Generalized additive models: an introduction with R, 2nd ed. Boca Raton, FL: Chapman and Hall/CRC; 2017.10.1201/9781315370279Search in Google Scholar

5. Bates, D, Mächler, M, Bolker, B, Walker, S. Fitting linear mixed-effects models using lme4. J Stat Software 2015;67:1–48. https://doi.org/10.18637/jss.v067.i01.Search in Google Scholar

6. Pinheiro, J, Bates, D, DebRoy, S, Sarkar, D, R Core Team. nlme: linear and nonlinear mixed effects models; 2020. Available from: https://CRAN.R-project.org/package=nlme. R package version 3.1-148.Search in Google Scholar

7. Crainiceanu, CM, Ruppert, D. Likelihood ratio tests in linear mixed models with one variance component. J Roy Stat Soc B 2004;66:165–85. https://doi.org/10.1111/j.1467-9868.2004.00438.x.Search in Google Scholar

8. Vaida, F, Blanchard, S. Conditional Akaike information for mixed-effects models. Biometrika 2005;92:351–70. https://doi.org/10.1093/biomet/92.2.351.Search in Google Scholar

9. Greven, S, Kneib, T. On the behaviour of marginal and conditional aic in linear mixed models. Biometrika 2010;97:773–89. https://doi.org/10.1093/biomet/asq042.Search in Google Scholar

10. Schelldorfer, J, Bühlmann, P, De Geer, Sv. Estimation for high-dimensional linear mixed-effects models using l1-penalization. Scand J Stat 2011;38:197–214. https://doi.org/10.1111/j.1467-9469.2011.00740.x.Search in Google Scholar

11. Groll, A, Tutz, G. Variable selection for generalized linear mixed models by l1-penalized estimation. Stat Comput 2014;24:137–54. https://doi.org/10.1007/s11222-012-9359-z.Search in Google Scholar

12. Hui, FK, Müller, S, Welsh, A. Joint selection in mixed models using regularized pql. J Am Stat Assoc 2017;112:1323–33. https://doi.org/10.1080/01621459.2016.1215989.Search in Google Scholar

13. Tibshirani, R. Regression shrinkage and selection via the lasso. J Roy Stat Soc B 1996;58:267–88. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.Search in Google Scholar

14. Friedman, J, Hastie, T, Tibshirani, R. Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 2000;28:337–407. https://doi.org/10.1214/aos/1016218223.Search in Google Scholar

15. Bradic, J, Claeskens, G, Gueuning, T. Fixed effects testing in high-dimensional linear mixed models. J Am Stat Assoc 2019;115:1835–50. https://doi.org/10.1080/01621459.2019.1660172.Search in Google Scholar

16. Freund, Y, Schapire, RE. Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning theory. San Francisco: Morgan Kaufmann; 1996:148–56 pp.Search in Google Scholar

17. Breiman, L. Arcing classifiers (with discussion). Ann Stat 1998;26:801–49. https://doi.org/10.1214/aos/1024691079.Search in Google Scholar

18. Breiman, L. Prediction games and arcing algorithms. Neural Comput 1999;11:1493–517. https://doi.org/10.1162/089976699300016106.Search in Google Scholar PubMed

19. Friedman, J, Hastie, T, Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J Stat Software 2010;33:1–22. https://doi.org/10.18637/jss.v033.i01.Search in Google Scholar

20. Hepp, T, Schmid, M, Gefeller, O, Waldmann, E, Mayr, A. Approaches to regularized regression a comparison between gradient boosting and the lasso. Methods Inf Med 2016;455:422–30. https://doi.org/10.3414/ME16-01-0033.Search in Google Scholar PubMed

21. Mayr, A, Binder, H, Gefeller, O, Schmid, M. The evolution of boosting algorithms – from machine learning to statistical modelling. Methods Inf Med 2014;53:419–27. https://doi.org/10.3414/ME13-01-0122.Search in Google Scholar PubMed

22. Bühlmann, P, Hothorn, T. Boosting algorithms: regularization, prediction and model fitting. Stat Sci 2007;27:477–505. https://doi.org/10.1214/07-sts242.Search in Google Scholar

23. Hothorn, T, Bühlmann, P, Kneib, T, Schmid, M, Hofner, B. mboost: model-based boosting; 2018. Available from: https://CRAN.R-project.org/package=mboost. R package version 2.9-1.Search in Google Scholar

24. Kneib, T, Hothorn, T, Tutz, G. Variable selection and model choice in geoadditive regression models. Biometrics 2009;65:626–34. https://doi.org/10.1111/j.1541-0420.2008.01112.x.Search in Google Scholar PubMed

25. Hofner, B, Mayr, A, Robinzonov, N, Schmid, M. Model-based boosting in R: a hands-on tutorial using the R package mboost. Comput Stat 2014;29:3–35. https://doi.org/10.1007/s00180-012-0382-5.Search in Google Scholar

26. Waldmann, E, Taylor-Robinson, D, Klein, N, Kneib, T, Pressler, T, Schmid, M, et al.. Boosting joint models for longitudinal and time-to-event data. Biom J 2017;59:1104–21. doi:https://doi.org/10.1002/bimj.201600158.Search in Google Scholar PubMed

27. Tutz, G, Binder, H. Generalized additive models with implicit variable selection by likelihood-based boosting. Biometrics 2006;62:961–71. https://doi.org/10.1111/j.1541-0420.2006.00578.x.Search in Google Scholar PubMed

28. Tutz, G, Reithinger, F. A boosting approach to flexible semiparametric mixed models. Stat Med 2007;26:2872–900. https://doi.org/10.1002/sim.2738.Search in Google Scholar PubMed

29. Groll, A. Variable selection by regularization methods for generalized mixed models [Ph.D. thesis]. Ludwig-Maximilians-Universität München; 2011.Search in Google Scholar

30. Tutz, G, Groll, A. Generalized linear mixed models based on boosting. In: Kneib T, Tutz G, editors. Statistical modelling and regression structures – Festschrift in the honour of Ludwig Fahrmeir. Heidelberg: Physica; 2010:197–216 pp.10.1007/978-3-7908-2413-1_11Search in Google Scholar

31. Griesbach, C, Groll, A, Waldmann, E. Addressing cluster-constant covariates in mixed effects models via likelihood-based boosting techniques. arXiv e-prints, arXiv:1912.06382. 2019.Search in Google Scholar

32. Breslow, NE, Clayton, DG. Approximate inference in generalized linear mixed model. J Am Stat Assoc 1993;88:9–52. https://doi.org/10.1080/01621459.1993.10594284.Search in Google Scholar

33. Schmid, M, Hothorn, T. Flexible boosting of accelerated failure time models. BMC Bioinf 2008;9. https://doi.org/10.1186/1471-2105-9-269.Search in Google Scholar PubMed PubMed Central

34. Schmid, M, Hothorn, T, Maloney, KO, Weller, DE, Potapov, S. Geoadditive regression modeling of stream biological condition. Environ Ecol Stat 2010;18:709–33. https://doi.org/10.1007/s10651-010-0158-4.Search in Google Scholar

35. Hothorn, T. Transformation boosting machines. Stat Comput 2019;30:141–52. https://doi.org/10.1007/s11222-019-09870-4.Search in Google Scholar

36. Fahrmeir, L, Tutz, G. Multivariate statistical modelling based on generalized linear models, 2 edition. New York: Springer-Verlag; 2001.10.1007/978-1-4757-3454-6Search in Google Scholar

37. Hurvich, C, Simonoff, J, Tsai, C. Smoothing parameter selection in non-parametric regression using an improved akaike information criterion. J Roy Stat Soc B 2002;60:271–93. https://doi.org/10.1111/1467-9868.00125.Search in Google Scholar

38. Mayr, A, Hofner, B, Schmid, M. The importance of knowing when to stop. a sequential stopping rule for component-wise gradient boosting. Methods Inf Med 2012;51:178–86. https://doi.org/10.3414/ME11-02-0030.Search in Google Scholar PubMed

39. Eilers, P, Marx, B. Flexible smoothing with b-splines and penalties. Stat Sci 1996;11:89–102. https://doi.org/10.1214/ss/1038425655.Search in Google Scholar

40. Rigby, RA, Stasinopoulos, MD. Generalized additive models for location, scale and shape, (with discussion). Appl Stat 2005;54:507–54. https://doi.org/10.1111/j.1467-9876.2005.00510.x.Search in Google Scholar

41. Mayr, A, Fenske, N, Hofner, B, Kneib, T, Matthias, S. Generalized additive models for location scale and shape for high-dimensional data a flexible approach based on boosting. J Roy Stat Soc C Appl Stat 2012;61:403–27. doi:https://doi.org/10.1111/j.1467-9876.2011.01033.x.Search in Google Scholar


Supplementary Material

The online version of this article offers supplementary material (https://doi.org/10.1515/ijb-2020-0136).


Received: 2020-09-15
Accepted: 2020-12-07
Published Online: 2021-01-13

© 2020 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 14.5.2024 from https://www.degruyter.com/document/doi/10.1515/ijb-2020-0136/html
Scroll to top button