Gradient boosting for linear mixed models

Colin Griesbach; Benjamin Säfken; Elisabeth Waldmann

doi:10.1515/ijb-2020-0136

Published by De Gruyter January 13, 2021

Gradient boosting for linear mixed models

Colin Griesbach , Benjamin Säfken and Elisabeth Waldmann

From the journal The International Journal of Biostatistics

https://doi.org/10.1515/ijb-2020-0136

Showing a limited preview of this publication:

Abstract

Gradient boosting from the field of statistical learning is widely known as a powerful framework for estimation and selection of predictor effects in various regression models by adapting concepts from classification theory. Current boosting approaches also offer methods accounting for random effects and thus enable prediction of mixed models for longitudinal and clustered data. However, these approaches include several flaws resulting in unbalanced effect selection with falsely induced shrinkage and a low convergence rate on the one hand and biased estimates of the random effects on the other hand. We therefore propose a new boosting algorithm which explicitly accounts for the random structure by excluding it from the selection procedure, properly correcting the random effects estimates and in addition providing likelihood-based estimation of the random effects variance structure. The new algorithm offers an organic and unbiased fitting approach, which is shown via simulations and data examples.

Keywords: gradient boosting; mixed models; regularised regression; statistical learning

Corresponding author: Colin Griesbach, Department of Medical Informatics, Biometry and Epidemiology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany, E-mail: colin.griesbach@fau.de

Funding source: DFG

Award Identifier / Grant number: Projekt WA 4249/2-1

Funding source: Volkswagen Foundation

Award Identifier / Grant number: Freigeist Fellowship

Acknowledgement

Colin Griesbach performed the present work in partial fulfilment of the requirements for obtaining the degree ‘Dr. rer. biol. hum.’ at the Friedrich-Alexander-Universität Erlangen-Nürnberg.

Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: This paper was funded by DFG (Projekt WA 4249/2-1) and Volkswagen Foundation (Freigeist Fellowship).
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

Appendix

1 Formulating the correction matrix C

Due to the updating procedure, random effects estimates γ ˆ need to be corrected in order to ensure uncorrelated estimates with any other given covariates and thus also unbiased coefficient estimates β ˆ for the fixed effects. At first, sets of covariates X _cs, s = 1, … , q, have to be specified for each random effect which has to be corrected. For random intercepts X _cs will include a column of ones as well as one representative of every cluster-constant covariate. For random slopes X _cs will just contain a column of ones (which simplifies to centering the corresponding random effect) or additional cluster-constant covariates, if interaction effects are included for the covariate, the given random slope is specified for. The single correction matrices can then be computed by

C s = X c s ( X c s T X c s ) − 1 X c s , s = 1 , … , q

and one obtains the block diagonal C ˜ = diag ( C 1 , … , C q ) . The final correction matrix C is then obtained with

C = P − 1 ( I n q − C ˜ ) P ,

where P is a permutation matrix mapping γ to

P γ = γ ˜ = ( γ ˜ 1 , … , γ ˜ q )

with γ ˜ s = ( γ s 1 , … , γ s n ) . The product Cγ corrects each random effect s for any covariates contained in the corresponding matrix X _cs by counting out the orthogonal projections of the sth random effect estimates on the subspace generated by the covariates X _cs. This ensures the coefficient estimate for the random effects to be uncorrelated with any observed covariate.

2 Computational effort

Table 6 depicts the elapsed computation time of each simulation run from Section 2. The computational effort for mboost scales pretty consistently with the number of candidate variables p leading to a slightly faster runtime in the p = 10 case but higher effort for increasing dimensions. For grbLMM, stopping based on AIC leads to a longer runtime since the computation of each iteration’s boosting hat matrix is computationally more intensive.

Table 6:

Elapsed computation time averaged over 100 simulation runs for each scenario. t _int denotes the runtime for random intercept setups and t _slp for additional random slopes.

τ	p	`lme4`		`mboost`		`grbLMM` ^a		`grbLMM` ^b
τ	p	t _int	t _slp	t _int	t _slp	t _int	t _slp	t _int	t _slp
0.4	10	0.15	0.30	224	327	433	505	1905	1936
0.4	25	0.17	0.40	501	602	446	517	1908	1939
0.4	50	0.22	0.72	954	1053	466	538	1912	1945
0.4	100	0.35	1.60	1868	1970	506	576	1922	1954
0.4	500	–	–	9505	9602	819	892	2344	2391
0.8	10	0.14	0.33	224	327	434	505	1904	1937
0.8	25	0.17	0.46	502	603	448	518	1908	1939
0.8	50	0.23	0.84	955	1053	467	537	1913	1944
0.8	100	0.37	1.88	1869	1972	507	576	1921	1955
0.8	500	–	–	9484	9480	820	890	2337	2380
1.6	10	0.15	0.49	223	327	434	502	1904	1936
1.6	25	0.18	0.73	502	603	447	517	1908	1939
1.6	50	0.25	1.33	956	1054	470	535	1912	1945
1.6	100	0.46	3.02	1869	1974	506	577	1922	1954
1.6	500	–	–	9464	9572	822	891	2358	2379

References

1. Laird, NM, Ware, JH. Random-effects models for longitudinal data. Biometrics 1982;38:963–74. https://doi.org/10.2307/2529876.Search in Google Scholar

2. Anderssen, R, Bloomfield, P. A time series approach to numerical differentiation. Technometrics 1974;16:69–75. https://doi.org/10.1080/00401706.1974.10489151.Search in Google Scholar

3. Wahba, G. A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem. Ann Stat 1985;13:1378–402. doi:https://doi.org/10.1214/aos/1176349743.Search in Google Scholar

4. Wood, S. Generalized additive models: an introduction with R, 2nd ed. Boca Raton, FL: Chapman and Hall/CRC; 2017.10.1201/9781315370279Search in Google Scholar

5. Bates, D, Mächler, M, Bolker, B, Walker, S. Fitting linear mixed-effects models using lme4. J Stat Software 2015;67:1–48. https://doi.org/10.18637/jss.v067.i01.Search in Google Scholar

6. Pinheiro, J, Bates, D, DebRoy, S, Sarkar, D, R Core Team. nlme: linear and nonlinear mixed effects models; 2020. Available from: https://CRAN.R-project.org/package=nlme. R package version 3.1-148.Search in Google Scholar

7. Crainiceanu, CM, Ruppert, D. Likelihood ratio tests in linear mixed models with one variance component. J Roy Stat Soc B 2004;66:165–85. https://doi.org/10.1111/j.1467-9868.2004.00438.x.Search in Google Scholar

8. Vaida, F, Blanchard, S. Conditional Akaike information for mixed-effects models. Biometrika 2005;92:351–70. https://doi.org/10.1093/biomet/92.2.351.Search in Google Scholar

9. Greven, S, Kneib, T. On the behaviour of marginal and conditional aic in linear mixed models. Biometrika 2010;97:773–89. https://doi.org/10.1093/biomet/asq042.Search in Google Scholar

10. Schelldorfer, J, Bühlmann, P, De Geer, Sv. Estimation for high-dimensional linear mixed-effects models using l1-penalization. Scand J Stat 2011;38:197–214. https://doi.org/10.1111/j.1467-9469.2011.00740.x.Search in Google Scholar

11. Groll, A, Tutz, G. Variable selection for generalized linear mixed models by l1-penalized estimation. Stat Comput 2014;24:137–54. https://doi.org/10.1007/s11222-012-9359-z.Search in Google Scholar

12. Hui, FK, Müller, S, Welsh, A. Joint selection in mixed models using regularized pql. J Am Stat Assoc 2017;112:1323–33. https://doi.org/10.1080/01621459.2016.1215989.Search in Google Scholar

13. Tibshirani, R. Regression shrinkage and selection via the lasso. J Roy Stat Soc B 1996;58:267–88. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.Search in Google Scholar

14. Friedman, J, Hastie, T, Tibshirani, R. Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 2000;28:337–407. https://doi.org/10.1214/aos/1016218223.Search in Google Scholar

15. Bradic, J, Claeskens, G, Gueuning, T. Fixed effects testing in high-dimensional linear mixed models. J Am Stat Assoc 2019;115:1835–50. https://doi.org/10.1080/01621459.2019.1660172.Search in Google Scholar

16. Freund, Y, Schapire, RE. Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning theory. San Francisco: Morgan Kaufmann; 1996:148–56 pp.Search in Google Scholar

17. Breiman, L. Arcing classifiers (with discussion). Ann Stat 1998;26:801–49. https://doi.org/10.1214/aos/1024691079.Search in Google Scholar

18. Breiman, L. Prediction games and arcing algorithms. Neural Comput 1999;11:1493–517. https://doi.org/10.1162/089976699300016106.Search in Google Scholar PubMed

19. Friedman, J, Hastie, T, Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J Stat Software 2010;33:1–22. https://doi.org/10.18637/jss.v033.i01.Search in Google Scholar

20. Hepp, T, Schmid, M, Gefeller, O, Waldmann, E, Mayr, A. Approaches to regularized regression a comparison between gradient boosting and the lasso. Methods Inf Med 2016;455:422–30. https://doi.org/10.3414/ME16-01-0033.Search in Google Scholar PubMed

21. Mayr, A, Binder, H, Gefeller, O, Schmid, M. The evolution of boosting algorithms – from machine learning to statistical modelling. Methods Inf Med 2014;53:419–27. https://doi.org/10.3414/ME13-01-0122.Search in Google Scholar PubMed

22. Bühlmann, P, Hothorn, T. Boosting algorithms: regularization, prediction and model fitting. Stat Sci 2007;27:477–505. https://doi.org/10.1214/07-sts242.Search in Google Scholar

23. Hothorn, T, Bühlmann, P, Kneib, T, Schmid, M, Hofner, B. mboost: model-based boosting; 2018. Available from: https://CRAN.R-project.org/package=mboost. R package version 2.9-1.Search in Google Scholar

24. Kneib, T, Hothorn, T, Tutz, G. Variable selection and model choice in geoadditive regression models. Biometrics 2009;65:626–34. https://doi.org/10.1111/j.1541-0420.2008.01112.x.Search in Google Scholar PubMed

25. Hofner, B, Mayr, A, Robinzonov, N, Schmid, M. Model-based boosting in R: a hands-on tutorial using the R package mboost. Comput Stat 2014;29:3–35. https://doi.org/10.1007/s00180-012-0382-5.Search in Google Scholar

26. Waldmann, E, Taylor-Robinson, D, Klein, N, Kneib, T, Pressler, T, Schmid, M, et al.. Boosting joint models for longitudinal and time-to-event data. Biom J 2017;59:1104–21. doi:https://doi.org/10.1002/bimj.201600158.Search in Google Scholar PubMed

27. Tutz, G, Binder, H. Generalized additive models with implicit variable selection by likelihood-based boosting. Biometrics 2006;62:961–71. https://doi.org/10.1111/j.1541-0420.2006.00578.x.Search in Google Scholar PubMed

28. Tutz, G, Reithinger, F. A boosting approach to flexible semiparametric mixed models. Stat Med 2007;26:2872–900. https://doi.org/10.1002/sim.2738.Search in Google Scholar PubMed

29. Groll, A. Variable selection by regularization methods for generalized mixed models [Ph.D. thesis]. Ludwig-Maximilians-Universität München; 2011.Search in Google Scholar

30. Tutz, G, Groll, A. Generalized linear mixed models based on boosting. In: Kneib T, Tutz G, editors. Statistical modelling and regression structures – Festschrift in the honour of Ludwig Fahrmeir. Heidelberg: Physica; 2010:197–216 pp.10.1007/978-3-7908-2413-1_11Search in Google Scholar

31. Griesbach, C, Groll, A, Waldmann, E. Addressing cluster-constant covariates in mixed effects models via likelihood-based boosting techniques. arXiv e-prints, arXiv:1912.06382. 2019.Search in Google Scholar

32. Breslow, NE, Clayton, DG. Approximate inference in generalized linear mixed model. J Am Stat Assoc 1993;88:9–52. https://doi.org/10.1080/01621459.1993.10594284.Search in Google Scholar

33. Schmid, M, Hothorn, T. Flexible boosting of accelerated failure time models. BMC Bioinf 2008;9. https://doi.org/10.1186/1471-2105-9-269.Search in Google Scholar PubMed PubMed Central

34. Schmid, M, Hothorn, T, Maloney, KO, Weller, DE, Potapov, S. Geoadditive regression modeling of stream biological condition. Environ Ecol Stat 2010;18:709–33. https://doi.org/10.1007/s10651-010-0158-4.Search in Google Scholar

35. Hothorn, T. Transformation boosting machines. Stat Comput 2019;30:141–52. https://doi.org/10.1007/s11222-019-09870-4.Search in Google Scholar

36. Fahrmeir, L, Tutz, G. Multivariate statistical modelling based on generalized linear models, 2 edition. New York: Springer-Verlag; 2001.10.1007/978-1-4757-3454-6Search in Google Scholar

37. Hurvich, C, Simonoff, J, Tsai, C. Smoothing parameter selection in non-parametric regression using an improved akaike information criterion. J Roy Stat Soc B 2002;60:271–93. https://doi.org/10.1111/1467-9868.00125.Search in Google Scholar

38. Mayr, A, Hofner, B, Schmid, M. The importance of knowing when to stop. a sequential stopping rule for component-wise gradient boosting. Methods Inf Med 2012;51:178–86. https://doi.org/10.3414/ME11-02-0030.Search in Google Scholar PubMed

39. Eilers, P, Marx, B. Flexible smoothing with b-splines and penalties. Stat Sci 1996;11:89–102. https://doi.org/10.1214/ss/1038425655.Search in Google Scholar

40. Rigby, RA, Stasinopoulos, MD. Generalized additive models for location, scale and shape, (with discussion). Appl Stat 2005;54:507–54. https://doi.org/10.1111/j.1467-9876.2005.00510.x.Search in Google Scholar

41. Mayr, A, Fenske, N, Hofner, B, Kneib, T, Matthias, S. Generalized additive models for location scale and shape for high-dimensional data a flexible approach based on boosting. J Roy Stat Soc C Appl Stat 2012;61:403–27. doi:https://doi.org/10.1111/j.1467-9876.2011.01033.x.Search in Google Scholar

Supplementary Material

The online version of this article offers supplementary material (https://doi.org/10.1515/ijb-2020-0136).

Received: 2020-09-15

Accepted: 2020-12-07

Published Online: 2021-01-13

Gradient boosting for linear mixed models

Abstract

Acknowledgement

1 Formulating the correction matrix C

2 Computational effort

References

Supplementary Material

Journal and Issue

Articles in the same Issue