Skip to main content
Log in

Count regression trees

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Count data frequently appear in many scientific studies. In this article, we propose a regression tree method called CORE for analyzing such data. At each node, besides a Poisson regression, a count regression such as hurdle, negative binomial, or zero-inflated regression which can accommodate over-dispersion and/or excess zeros is fitted. A likelihood-based procedure is suggested to select split variables and split sets. Node deviance is then used in the tree pruning process to avoid overfitting. CORE is able to eliminate variable selection bias. In the simulations and real data studies, we show that CORE has some advantages over the existing method, MOB.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth and Brooks, Monterey

    MATH  Google Scholar 

  • Cameron AC, Trivedi PK (2013) Regression analysis of count data, 2nd edn. Cambridge University Press, New York

    Book  Google Scholar 

  • Chan KY, Loh WY (2004) LOTUS: an algorithm for building accurate and comprehensible logistic regression trees. J Comput Graph Stat 13(4):826–852

    Article  MathSciNet  Google Scholar 

  • Choi Y, Ahn H, Chen JJ (2005) Regression trees for analysis of count data with extra Poisson variation. Comput Stat Data Anal 49(3):893–915

    Article  MathSciNet  Google Scholar 

  • Ciampi A (1991) Generalized regression trees. Comput Stat Data Anal 12(1):57–78

    Article  MathSciNet  Google Scholar 

  • Comizzoli RB, Landwehr JM, Sinclair JD (1990) Robust materials and processes: key to reliability. AT & T Tech J 69(6):113–128

    Article  Google Scholar 

  • Hothorn T, Zeileis A (2015) partykit: a modular toolkit for recursive partytioning in R. J Mach Learn Res 16:3905–3909

    MathSciNet  MATH  Google Scholar 

  • Kleiber C, Zeileis A (2008) Applied econometrics with R. Springer, New York

    Book  Google Scholar 

  • Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York

    Book  Google Scholar 

  • Lambert D (1992) Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34(1):1–14

    Article  Google Scholar 

  • Lee SK, Jin S (2006) Decision tree approaches for zero inflated count data. J Appl Stat 33(8):853–864

    Article  MathSciNet  Google Scholar 

  • Loh WY (2002) Regression tree with unbiased variable selection and interaction detection. Stat Sin 12(2):361–386

    MathSciNet  MATH  Google Scholar 

  • Loh WY (2006) Regression tree models for designed experiments. In: Rojo J (ed) Second E. L. Lehmann Symposium, IMS Lecture Notes-Monograph Series, vol 49, pp 210–228

    Google Scholar 

  • Loh WY (2009) Improving the precision of classification trees. Ann Appl Stat 3(4):1710–1737

    Article  MathSciNet  Google Scholar 

  • Loh WY (2014) Fifty years of classification and regression trees. Int Stat Rev 82(3):329–348

    Article  MathSciNet  Google Scholar 

  • Long JS (1990) The origins of sex differences in science. Soc Forces 68(4):1297–1316

    Article  Google Scholar 

  • Long JS (1997) Regression models for categorical and limited dependent variables. Sage Publications, Thousand Oaks

    MATH  Google Scholar 

  • Mullahy J (1986) Specification and testing of some modified count data models. J Econ 33(3):341–365

    Article  MathSciNet  Google Scholar 

  • Neelon B, O’Malley AJ, Smith VA (2016) Modeling zero-modified count and semicontinuous data in health services research part 1: background and overview. Stat Med 35(27):5070–5093

    Article  MathSciNet  Google Scholar 

  • Rusch T, Zeileis A (2013) Gaining insight with recursive partitioning of generalized linear models. J Stat Comput Simul 83(7):1301–1315

    Article  MathSciNet  Google Scholar 

  • Wilson EB, Hilferty MM (1931) The distribution of chi-square. Proc Natl Acad Sci USA 17:684–688

    Article  Google Scholar 

  • Yee TW (2015) Vector generalized linear and additive models: with an implementation in R. Springer, New York

    Book  Google Scholar 

  • Zeileis A, Hornik K (2007) Generalized M-fluctuation tests for parameter instability. Statistica Neerlandica 61(4):488–508

    Article  MathSciNet  Google Scholar 

  • Zeileis A, Hothorn T, Hornik K (2008a) Model-based recursive partitioning. J Comput Graph Stat 17(2):492–514

    Article  MathSciNet  Google Scholar 

  • Zeileis A, Kleiber C, Jackman S (2008b) Regression models for count data in R. J Stat Softw Articles 27(8):1–25

    Google Scholar 

Download references

Acknowledgements

The authors thank Dr. Chen-Hsin Chen of Academia Sinica for his insightful comments and suggestions during our discussions. We are very grateful to the two reviewers for the helpful comments. This research is partly supported by Taiwan MOST grant 105-2118-M-194-001 and Domestic Visiting Scholar Program of Academica Sinica, Taiwan, contract 106-1-1-06-18.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu-Shan Shih.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, NT., Lin, FC. & Shih, YS. Count regression trees. Adv Data Anal Classif 14, 5–27 (2020). https://doi.org/10.1007/s11634-019-00358-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-019-00358-7

Keywords

Mathematics Subject Classification

Navigation