Abstract
Count data frequently appear in many scientific studies. In this article, we propose a regression tree method called CORE for analyzing such data. At each node, besides a Poisson regression, a count regression such as hurdle, negative binomial, or zero-inflated regression which can accommodate over-dispersion and/or excess zeros is fitted. A likelihood-based procedure is suggested to select split variables and split sets. Node deviance is then used in the tree pruning process to avoid overfitting. CORE is able to eliminate variable selection bias. In the simulations and real data studies, we show that CORE has some advantages over the existing method, MOB.
Similar content being viewed by others
References
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth and Brooks, Monterey
Cameron AC, Trivedi PK (2013) Regression analysis of count data, 2nd edn. Cambridge University Press, New York
Chan KY, Loh WY (2004) LOTUS: an algorithm for building accurate and comprehensible logistic regression trees. J Comput Graph Stat 13(4):826–852
Choi Y, Ahn H, Chen JJ (2005) Regression trees for analysis of count data with extra Poisson variation. Comput Stat Data Anal 49(3):893–915
Ciampi A (1991) Generalized regression trees. Comput Stat Data Anal 12(1):57–78
Comizzoli RB, Landwehr JM, Sinclair JD (1990) Robust materials and processes: key to reliability. AT & T Tech J 69(6):113–128
Hothorn T, Zeileis A (2015) partykit: a modular toolkit for recursive partytioning in R. J Mach Learn Res 16:3905–3909
Kleiber C, Zeileis A (2008) Applied econometrics with R. Springer, New York
Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York
Lambert D (1992) Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34(1):1–14
Lee SK, Jin S (2006) Decision tree approaches for zero inflated count data. J Appl Stat 33(8):853–864
Loh WY (2002) Regression tree with unbiased variable selection and interaction detection. Stat Sin 12(2):361–386
Loh WY (2006) Regression tree models for designed experiments. In: Rojo J (ed) Second E. L. Lehmann Symposium, IMS Lecture Notes-Monograph Series, vol 49, pp 210–228
Loh WY (2009) Improving the precision of classification trees. Ann Appl Stat 3(4):1710–1737
Loh WY (2014) Fifty years of classification and regression trees. Int Stat Rev 82(3):329–348
Long JS (1990) The origins of sex differences in science. Soc Forces 68(4):1297–1316
Long JS (1997) Regression models for categorical and limited dependent variables. Sage Publications, Thousand Oaks
Mullahy J (1986) Specification and testing of some modified count data models. J Econ 33(3):341–365
Neelon B, O’Malley AJ, Smith VA (2016) Modeling zero-modified count and semicontinuous data in health services research part 1: background and overview. Stat Med 35(27):5070–5093
Rusch T, Zeileis A (2013) Gaining insight with recursive partitioning of generalized linear models. J Stat Comput Simul 83(7):1301–1315
Wilson EB, Hilferty MM (1931) The distribution of chi-square. Proc Natl Acad Sci USA 17:684–688
Yee TW (2015) Vector generalized linear and additive models: with an implementation in R. Springer, New York
Zeileis A, Hornik K (2007) Generalized M-fluctuation tests for parameter instability. Statistica Neerlandica 61(4):488–508
Zeileis A, Hothorn T, Hornik K (2008a) Model-based recursive partitioning. J Comput Graph Stat 17(2):492–514
Zeileis A, Kleiber C, Jackman S (2008b) Regression models for count data in R. J Stat Softw Articles 27(8):1–25
Acknowledgements
The authors thank Dr. Chen-Hsin Chen of Academia Sinica for his insightful comments and suggestions during our discussions. We are very grateful to the two reviewers for the helpful comments. This research is partly supported by Taiwan MOST grant 105-2118-M-194-001 and Domestic Visiting Scholar Program of Academica Sinica, Taiwan, contract 106-1-1-06-18.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, NT., Lin, FC. & Shih, YS. Count regression trees. Adv Data Anal Classif 14, 5–27 (2020). https://doi.org/10.1007/s11634-019-00358-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-019-00358-7