Skip to main content
Log in

Bayesian additive regression trees with model trees

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Bayesian additive regression trees (BART) is a tree-based machine learning method that has been successfully applied to regression and classification problems. BART assumes regularisation priors on a set of trees that work as weak learners and is very flexible for predicting in the presence of nonlinearity and high-order interactions. In this paper, we introduce an extension of BART, called model trees BART (MOTR-BART), that considers piecewise linear functions at node levels instead of piecewise constants. In MOTR-BART, rather than having a unique value at node level for the prediction, a linear predictor is estimated considering the covariates that have been used as the split variables in the corresponding tree. In our approach, local linearities are captured more efficiently and fewer trees are required to achieve equal or better performance than BART. Via simulation studies and real data applications, we compare MOTR-BART to its main competitors. R code for MOTR-BART implementation is available at https://github.com/ebprado/MOTR-BART.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Albert, J.H., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88(422), 669–679 (1993)

    Article  MathSciNet  Google Scholar 

  • Athey, S., Tibshirani, J., Wager, S., et al.: Generalized random forests. Ann. Stati. 47(2), 1148–1178 (2019)

    Article  MathSciNet  Google Scholar 

  • Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  • Carvalho, C.M., Polson, N.G., Scott, J.G.: The horseshoe estimator for sparse signals. Biometrika 97(2), 465–480 (2010)

    Article  MathSciNet  Google Scholar 

  • Chipman, H.A., George, E.I., McCulloch, R.E.: Bayesian cart model search. J. Am. Stat. Assoc. 93(443), 935–948 (1998)

    Article  Google Scholar 

  • Chipman, H.A., George, E.I., McCulloch, R.E., et al.: Bart: Bayesian additive regression trees. Ann. Appl. Stat. 4(1), 266–298 (2010)

    Article  MathSciNet  Google Scholar 

  • Deshpande, S.K., Bai, R., Balocchi, C., Starling, J.E.: (2020) Vc-bart: Bayesian trees for varying coefficients. arXiv preprint arXiv:2003.06416

  • Friedberg, R., Tibshirani, J., Athey, S., Wager, S.: Local linear forests. (2018) arXiv preprint arXiv:1807.11408

  • Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1 (2010)

    Article  Google Scholar 

  • Friedman, J.H.: Multivariate adaptive regression splines. The annals of statistics pp 1–67 (1991)

  • Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. pp. 1189–1232 (2001)

  • Green, D.P., Kern, H.L.: Modeling heterogeneous treatment effects in survey experiments with bayesian additive regression trees. Public Opin. Quart. 76(3), 491–511 (2012)

    Article  Google Scholar 

  • Greenwell, B., Boehmke, B., Cunningham, J., Developers, G.: gbm: Generalized boosted regression models. https://CRAN.R-project.org/package=gbm, r package version 2.1.5 (2019)

  • Hahn, P.R., Murray, J.S., Carvalho, C.M., et al.: Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects. Bayesian Analysis (2020)

  • He, J., Yalov, S., Hahn, P.R.: Xbart: Accelerated Bayesian additive regression trees. In: Proceedings of the 22nd international conference on artificial intelligence and statistics 89 (2019)

  • Hernández, B., Pennington, S.R., Parnell, A.C.: Bayesian methods for proteomic biomarker development. EuPA Open Proteom. 9, 54–64 (2015)

    Article  Google Scholar 

  • Hernández, B., Raftery, A.E., Pennington, S.R., Parnell, A.C.: Bayesian additive regression trees using bayesian model averaging. Stat. Comput. 28(4), 869–890 (2018)

    Article  MathSciNet  Google Scholar 

  • Hill, J.L.: Bayesian nonparametric modeling for causal inference. J. Comput. Gr. Stat. 20(1), 217–240 (2011)

    Article  MathSciNet  Google Scholar 

  • Kapelner, A., Bleich, J.: bartMachine: Machine learning with Bayesian additive regression trees. J. Stat. Softw. 70(4), 1–40 (2016). https://doi.org/10.18637/jss.v070.i04

    Article  Google Scholar 

  • Kindo, B.P., Wang, H., Hanson, T., Peña, E.A.: (2016a) Bayesian quantile additive regression trees. arXiv preprint arXiv:1607.02676

  • Kindo, B.P., Wang, H., Peña, E.A.: Multinomial probit bayesian additive regression trees. Stat 5(1), 119–131 (2016b)

    Article  MathSciNet  Google Scholar 

  • Künzel, S.R., Saarinen, T.F., Liu, E.W., Sekhon, J.S.: Linear aggregation in tree-based estimators (2019) arXiv preprint arXiv:1906.06463

  • Landwehr, N., Hall, M., Frank, E.: Logistic model trees. Mach. Learn. 59(1–2), 161–205 (2005)

    Article  Google Scholar 

  • Linero, A.: SoftBart: A package for implementing the SoftBart algorithm. R package version 1, (2017a)

  • Linero, A.R.: A review of tree-based bayesian methods. Commun. Stat. Appl. Methods 24(6), (2017b)

  • Linero, A.R.: Bayesian regression trees for high-dimensional prediction and variable selection. J. Am. Stat. Assoc. 113(522), 626–636 (2018)

    Article  MathSciNet  Google Scholar 

  • Linero, A.R., Yang, Y.: Bayesian regression tree ensembles that adapt to smoothness and sparsity. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 80(5), 1087–1110 (2018)

    Article  MathSciNet  Google Scholar 

  • Linero, A.R., Sinha, D., Lipsitz, S.R.: Semiparametric mixed-scale models using shared bayesian forests (2018) arXiv preprint arXiv:1809.08521

  • McCulloch, R., Sparapani, R., Gramacy, R., Spanbauer, C., Pratola, M.: BART: Bayesian Additive Regression Trees. https://CRAN.R-project.org/package=BART, r package version 2.7 (2019)

  • Murray, J.S.: Log-linear bayesian additive regression trees for categorical and count responses.(2017) arXiv preprint arXiv:1701.01503

  • Pratola, M., Chipman, H., George, E., McCulloch, R.: Heteroscedastic bart using multiplicative regression trees (2017). arXiv preprint arXiv:1709.07542

  • Quinlan, J.R.: Learning with continuous classes. In: 5th Australian joint conference on artificial intelligence, World Scientific, vol 92, pp 343–348 (1992)

  • R Core Team R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2020) https://www.R-project.org/

  • Ročková, V., van der Pas, S.: Posterior concentration for bayesian regression trees and forests (2017). arXiv preprint arXiv:1708.08734

  • Ročková, V., Saha, E.: On theory for bart (2018). arXiv preprint arXiv:1810.00787

  • Schnell, P.M., Tang, Q., Offen, W.W., Carlin, B.P.: A bayesian credible subgroups approach to identifying patient subgroups with positive treatment effects. Biometrics 72(4), 1026–1036 (2016)

    Article  MathSciNet  Google Scholar 

  • Sivaganesan, S., Müller, P., Huang, B.: Subgroup finding via bayesian additive regression trees. Stat. Med. 36(15), 2391–2403 (2017)

    Article  MathSciNet  Google Scholar 

  • Sparapani, R., Logan, B.R., McCulloch, R.E., Laud, P.W.: Nonparametric competing risks analysis using bayesian additive regression trees. Stat. Methods Med. Res. p 0962280218822140 (2019)

  • Sparapani, R.A., Logan, B.R., McCulloch, R.E., Laud, P.W.: Nonparametric survival analysis using bayesian additive regression trees (bart). Stat. Med. 35(16), 2741–2753 (2016)

    Article  MathSciNet  Google Scholar 

  • Starling, J.E., Aiken, C.E., Murray, J.S., Nakimuli, A., Scott, J.G.: Monotone function estimation in the presence of extreme data coarsening: Analysis of preeclampsia and birth weight in urban uganda (2019). arXiv preprint arXiv:19120.6946

  • Starling, J.E., Murray, J.S., Carvalho, C.M., Bukowski, R.K., Scott, J.G., et al.: Bart with targeted smoothing: an analysis of patient-specific stillbirth risk. Ann. Appl. Stat. 14(1), 28–50 (2020)

    Article  MathSciNet  Google Scholar 

  • Tibshirani, J., Athey, S., Wager, S.: grf: Generalized Random Forests (2020). https://CRAN.R-project.org/package=grf, r package version 1.2.0

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  • Wang, Y., Witten, I., van Someren, M., Widmer, G.: Inducing models trees for continuous classes. In: Proceedings of the Poster Papers of the European Conference on Machine Learning, Department of Computer Science, University of Waikato, New Zeland (1997)

  • Wright, M.N., Ziegler, A.: ranger: A fast implementation of random forests for high dimensional data in C++ and R. J. Stat. Softw. 77(1), 1–17 (2017). https://doi.org/10.18637/jss.v077.i01

    Article  Google Scholar 

  • Zhang, J.L., Härdle, W.K.: The bayesian additive classification tree applied to credit risk modelling. Comput. Stat. Data Anal. 54(5), 1197–1205 (2010)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank the editors and the two anonymous referees for their comments that greatly improved the earlier version of the paper. This work was supported by a Science Foundation Ireland Career Development Award grant number 17/CDA/4695.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Estevão B. Prado.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Simulation results

In this section, we present results related to the simulation scenarios shown in Sect. 5.1. In total, 9 data sets were created based on Friedman’s equation considering some combinations of sample size (n) and number of covariates (p). In Tables 1 and 2 , the medians and quartiles of the RMSE are shown for the algorithms MOTR-BART, BART, GB, RF, lasso, soft BART and LLF. The values in this table were graphically shown in Fig. 3. In addition, Table 3 presents the mean number of parameters utilised by BART, MOTR-BART and soft BART to calculate the final prediction.

Table 1 Median of the RMSE on test data of the Friedman data sets when \(n = 200 \text{ and } 500\)
Table 2 Median of the RMSE on test data of the Friedman data sets when \(n = 1000\)
Table 3 Friedman data sets: mean and standard deviation of the total number of terminal nodes created for BART and soft BART to generate the final prediction over 5000 iterations

Appendix B: Real data results

This appendix presents two tables with results associated with the data sets Ankara, Boston, Ozone and Compactiv. In Table 4, it is reported the median and quartiles of the RMSE computed on 10 test sets. The values in this table are related to the Fig. 4 from Sect. 5.2. Further, Table 5 shows the mean number of parameters utilised by BART, MOTR-BART and soft BART to calculate the final prediction for the aforementioned data sets.

Table 4 Real data sets: comparison of the median RMSE (and first and third quartiles) for Ankara, Boston, Ozone and Compactiv data sets on test data
Table 5 Real data sets: mean and standard deviation of the total number of terminal nodes created for BART and soft BART to generate the final prediction over 5000 iterations

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Prado, E.B., Moral, R.A. & Parnell, A.C. Bayesian additive regression trees with model trees. Stat Comput 31, 20 (2021). https://doi.org/10.1007/s11222-021-09997-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-021-09997-3

Keywords

Navigation