Abstract
Gradient boosting from the field of statistical learning is widely known as a powerful framework for estimation and selection of predictor effects in various regression models by adapting concepts from classification theory. Current boosting approaches also offer methods accounting for random effects and thus enable prediction of mixed models for longitudinal and clustered data. However, these approaches include several flaws resulting in unbalanced effect selection with falsely induced shrinkage and a low convergence rate on the one hand and biased estimates of the random effects on the other hand. We therefore propose a new boosting algorithm which explicitly accounts for the random structure by excluding it from the selection procedure, properly correcting the random effects estimates and in addition providing likelihood-based estimation of the random effects variance structure. The new algorithm offers an organic and unbiased fitting approach, which is shown via simulations and data examples.
Funding source: DFG
Award Identifier / Grant number: Projekt WA 4249/2-1
Funding source: Volkswagen Foundation
Award Identifier / Grant number: Freigeist Fellowship
Acknowledgement
Colin Griesbach performed the present work in partial fulfilment of the requirements for obtaining the degree ‘Dr. rer. biol. hum.’ at the Friedrich-Alexander-Universität Erlangen-Nürnberg.
-
Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
-
Research funding: This paper was funded by DFG (Projekt WA 4249/2-1) and Volkswagen Foundation (Freigeist Fellowship).
-
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
1 Formulating the correction matrix C
Due to the updating procedure, random effects estimates
and one obtains the block diagonal
where P is a permutation matrix mapping γ to
with
2 Computational effort
Table 6 depicts the elapsed computation time of each simulation run from Section 2. The computational effort for mboost scales pretty consistently with the number of candidate variables p leading to a slightly faster runtime in the p = 10 case but higher effort for increasing dimensions. For grbLMM, stopping based on AIC leads to a longer runtime since the computation of each iteration’s boosting hat matrix is computationally more intensive.
τ | p | lme4 | mboost | grbLMM a | grbLMM b | ||||
---|---|---|---|---|---|---|---|---|---|
t int | t slp | t int | t slp | t int | t slp | t int | t slp | ||
0.4 | 10 | 0.15 | 0.30 | 224 | 327 | 433 | 505 | 1905 | 1936 |
0.4 | 25 | 0.17 | 0.40 | 501 | 602 | 446 | 517 | 1908 | 1939 |
0.4 | 50 | 0.22 | 0.72 | 954 | 1053 | 466 | 538 | 1912 | 1945 |
0.4 | 100 | 0.35 | 1.60 | 1868 | 1970 | 506 | 576 | 1922 | 1954 |
0.4 | 500 | – | – | 9505 | 9602 | 819 | 892 | 2344 | 2391 |
0.8 | 10 | 0.14 | 0.33 | 224 | 327 | 434 | 505 | 1904 | 1937 |
0.8 | 25 | 0.17 | 0.46 | 502 | 603 | 448 | 518 | 1908 | 1939 |
0.8 | 50 | 0.23 | 0.84 | 955 | 1053 | 467 | 537 | 1913 | 1944 |
0.8 | 100 | 0.37 | 1.88 | 1869 | 1972 | 507 | 576 | 1921 | 1955 |
0.8 | 500 | – | – | 9484 | 9480 | 820 | 890 | 2337 | 2380 |
1.6 | 10 | 0.15 | 0.49 | 223 | 327 | 434 | 502 | 1904 | 1936 |
1.6 | 25 | 0.18 | 0.73 | 502 | 603 | 447 | 517 | 1908 | 1939 |
1.6 | 50 | 0.25 | 1.33 | 956 | 1054 | 470 | 535 | 1912 | 1945 |
1.6 | 100 | 0.46 | 3.02 | 1869 | 1974 | 506 | 577 | 1922 | 1954 |
1.6 | 500 | – | – | 9464 | 9572 | 822 | 891 | 2358 | 2379 |
References
1. Laird, NM, Ware, JH. Random-effects models for longitudinal data. Biometrics 1982;38:963–74. https://doi.org/10.2307/2529876.Search in Google Scholar
2. Anderssen, R, Bloomfield, P. A time series approach to numerical differentiation. Technometrics 1974;16:69–75. https://doi.org/10.1080/00401706.1974.10489151.Search in Google Scholar
3. Wahba, G. A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem. Ann Stat 1985;13:1378–402. doi:https://doi.org/10.1214/aos/1176349743.Search in Google Scholar
4. Wood, S. Generalized additive models: an introduction with R, 2nd ed. Boca Raton, FL: Chapman and Hall/CRC; 2017.10.1201/9781315370279Search in Google Scholar
5. Bates, D, Mächler, M, Bolker, B, Walker, S. Fitting linear mixed-effects models using lme4. J Stat Software 2015;67:1–48. https://doi.org/10.18637/jss.v067.i01.Search in Google Scholar
6. Pinheiro, J, Bates, D, DebRoy, S, Sarkar, D, R Core Team. nlme: linear and nonlinear mixed effects models; 2020. Available from: https://CRAN.R-project.org/package=nlme. R package version 3.1-148.Search in Google Scholar
7. Crainiceanu, CM, Ruppert, D. Likelihood ratio tests in linear mixed models with one variance component. J Roy Stat Soc B 2004;66:165–85. https://doi.org/10.1111/j.1467-9868.2004.00438.x.Search in Google Scholar
8. Vaida, F, Blanchard, S. Conditional Akaike information for mixed-effects models. Biometrika 2005;92:351–70. https://doi.org/10.1093/biomet/92.2.351.Search in Google Scholar
9. Greven, S, Kneib, T. On the behaviour of marginal and conditional aic in linear mixed models. Biometrika 2010;97:773–89. https://doi.org/10.1093/biomet/asq042.Search in Google Scholar
10. Schelldorfer, J, Bühlmann, P, De Geer, Sv. Estimation for high-dimensional linear mixed-effects models using l1-penalization. Scand J Stat 2011;38:197–214. https://doi.org/10.1111/j.1467-9469.2011.00740.x.Search in Google Scholar
11. Groll, A, Tutz, G. Variable selection for generalized linear mixed models by l1-penalized estimation. Stat Comput 2014;24:137–54. https://doi.org/10.1007/s11222-012-9359-z.Search in Google Scholar
12. Hui, FK, Müller, S, Welsh, A. Joint selection in mixed models using regularized pql. J Am Stat Assoc 2017;112:1323–33. https://doi.org/10.1080/01621459.2016.1215989.Search in Google Scholar
13. Tibshirani, R. Regression shrinkage and selection via the lasso. J Roy Stat Soc B 1996;58:267–88. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.Search in Google Scholar
14. Friedman, J, Hastie, T, Tibshirani, R. Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 2000;28:337–407. https://doi.org/10.1214/aos/1016218223.Search in Google Scholar
15. Bradic, J, Claeskens, G, Gueuning, T. Fixed effects testing in high-dimensional linear mixed models. J Am Stat Assoc 2019;115:1835–50. https://doi.org/10.1080/01621459.2019.1660172.Search in Google Scholar
16. Freund, Y, Schapire, RE. Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning theory. San Francisco: Morgan Kaufmann; 1996:148–56 pp.Search in Google Scholar
17. Breiman, L. Arcing classifiers (with discussion). Ann Stat 1998;26:801–49. https://doi.org/10.1214/aos/1024691079.Search in Google Scholar
18. Breiman, L. Prediction games and arcing algorithms. Neural Comput 1999;11:1493–517. https://doi.org/10.1162/089976699300016106.Search in Google Scholar PubMed
19. Friedman, J, Hastie, T, Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J Stat Software 2010;33:1–22. https://doi.org/10.18637/jss.v033.i01.Search in Google Scholar
20. Hepp, T, Schmid, M, Gefeller, O, Waldmann, E, Mayr, A. Approaches to regularized regression a comparison between gradient boosting and the lasso. Methods Inf Med 2016;455:422–30. https://doi.org/10.3414/ME16-01-0033.Search in Google Scholar PubMed
21. Mayr, A, Binder, H, Gefeller, O, Schmid, M. The evolution of boosting algorithms – from machine learning to statistical modelling. Methods Inf Med 2014;53:419–27. https://doi.org/10.3414/ME13-01-0122.Search in Google Scholar PubMed
22. Bühlmann, P, Hothorn, T. Boosting algorithms: regularization, prediction and model fitting. Stat Sci 2007;27:477–505. https://doi.org/10.1214/07-sts242.Search in Google Scholar
23. Hothorn, T, Bühlmann, P, Kneib, T, Schmid, M, Hofner, B. mboost: model-based boosting; 2018. Available from: https://CRAN.R-project.org/package=mboost. R package version 2.9-1.Search in Google Scholar
24. Kneib, T, Hothorn, T, Tutz, G. Variable selection and model choice in geoadditive regression models. Biometrics 2009;65:626–34. https://doi.org/10.1111/j.1541-0420.2008.01112.x.Search in Google Scholar PubMed
25. Hofner, B, Mayr, A, Robinzonov, N, Schmid, M. Model-based boosting in R: a hands-on tutorial using the R package mboost. Comput Stat 2014;29:3–35. https://doi.org/10.1007/s00180-012-0382-5.Search in Google Scholar
26. Waldmann, E, Taylor-Robinson, D, Klein, N, Kneib, T, Pressler, T, Schmid, M, et al.. Boosting joint models for longitudinal and time-to-event data. Biom J 2017;59:1104–21. doi:https://doi.org/10.1002/bimj.201600158.Search in Google Scholar PubMed
27. Tutz, G, Binder, H. Generalized additive models with implicit variable selection by likelihood-based boosting. Biometrics 2006;62:961–71. https://doi.org/10.1111/j.1541-0420.2006.00578.x.Search in Google Scholar PubMed
28. Tutz, G, Reithinger, F. A boosting approach to flexible semiparametric mixed models. Stat Med 2007;26:2872–900. https://doi.org/10.1002/sim.2738.Search in Google Scholar PubMed
29. Groll, A. Variable selection by regularization methods for generalized mixed models [Ph.D. thesis]. Ludwig-Maximilians-Universität München; 2011.Search in Google Scholar
30. Tutz, G, Groll, A. Generalized linear mixed models based on boosting. In: Kneib T, Tutz G, editors. Statistical modelling and regression structures – Festschrift in the honour of Ludwig Fahrmeir. Heidelberg: Physica; 2010:197–216 pp.10.1007/978-3-7908-2413-1_11Search in Google Scholar
31. Griesbach, C, Groll, A, Waldmann, E. Addressing cluster-constant covariates in mixed effects models via likelihood-based boosting techniques. arXiv e-prints, arXiv:1912.06382. 2019.Search in Google Scholar
32. Breslow, NE, Clayton, DG. Approximate inference in generalized linear mixed model. J Am Stat Assoc 1993;88:9–52. https://doi.org/10.1080/01621459.1993.10594284.Search in Google Scholar
33. Schmid, M, Hothorn, T. Flexible boosting of accelerated failure time models. BMC Bioinf 2008;9. https://doi.org/10.1186/1471-2105-9-269.Search in Google Scholar PubMed PubMed Central
34. Schmid, M, Hothorn, T, Maloney, KO, Weller, DE, Potapov, S. Geoadditive regression modeling of stream biological condition. Environ Ecol Stat 2010;18:709–33. https://doi.org/10.1007/s10651-010-0158-4.Search in Google Scholar
35. Hothorn, T. Transformation boosting machines. Stat Comput 2019;30:141–52. https://doi.org/10.1007/s11222-019-09870-4.Search in Google Scholar
36. Fahrmeir, L, Tutz, G. Multivariate statistical modelling based on generalized linear models, 2 edition. New York: Springer-Verlag; 2001.10.1007/978-1-4757-3454-6Search in Google Scholar
37. Hurvich, C, Simonoff, J, Tsai, C. Smoothing parameter selection in non-parametric regression using an improved akaike information criterion. J Roy Stat Soc B 2002;60:271–93. https://doi.org/10.1111/1467-9868.00125.Search in Google Scholar
38. Mayr, A, Hofner, B, Schmid, M. The importance of knowing when to stop. a sequential stopping rule for component-wise gradient boosting. Methods Inf Med 2012;51:178–86. https://doi.org/10.3414/ME11-02-0030.Search in Google Scholar PubMed
39. Eilers, P, Marx, B. Flexible smoothing with b-splines and penalties. Stat Sci 1996;11:89–102. https://doi.org/10.1214/ss/1038425655.Search in Google Scholar
40. Rigby, RA, Stasinopoulos, MD. Generalized additive models for location, scale and shape, (with discussion). Appl Stat 2005;54:507–54. https://doi.org/10.1111/j.1467-9876.2005.00510.x.Search in Google Scholar
41. Mayr, A, Fenske, N, Hofner, B, Kneib, T, Matthias, S. Generalized additive models for location scale and shape for high-dimensional data a flexible approach based on boosting. J Roy Stat Soc C Appl Stat 2012;61:403–27. doi:https://doi.org/10.1111/j.1467-9876.2011.01033.x.Search in Google Scholar
Supplementary Material
The online version of this article offers supplementary material (https://doi.org/10.1515/ijb-2020-0136).
© 2020 Walter de Gruyter GmbH, Berlin/Boston