Skip to main content
Log in

Using recursive partitioning to find and estimate heterogenous treatment effects in randomized clinical trials

  • Published:
Journal of Experimental Criminology Aims and scope Submit manuscript

Abstract

Objectives

When for an RCT heterogeneous treatment effects are inductively obtained, significant complications are introduced. Special loss functions may be needed to find local, average treatment effects followed by techniques that properly address post-selection statistical inference.

Methods

Reanalyzing a recidivism RCT, we use a new form of classification trees to seek heterogeneous treatment effects and then correct for “data snooping” with novel inferential procedures.

Results

There are perhaps increases in recidivism for a small subset of offenders whose risk factors place them toward the right tail of the risk distribution.

Conclusions

A legitimate but partial account for uncertainty might well reject the null hypothesis of no heterogenous treatment effects. An equally legitimate but far more complete account of uncertainty for this study fails to reject the null hypothesis of no heterogeneous treatment effects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. A local ATE is sometimes denoted by LATE. But that acronym is often associated with particular research settings. One example is the need to estimate for an RCT the effect of a treatment on the subset of study subjects whose treatment received was not the treatment randomly assigned (Angrist 2004: C57). Imbens (2009) provides a spirited defense of LATE. Because our research setting is very different, we avoid the acronym.

  2. In some situations, some might favor using the null hypothesis value for the global ATE. For example, if one has already failed to reject the the null hypothesis that ATE\(^{\ast }_{global} = 0.0\), one might argue for using 0.0 as the global ATE. The risk is that one is proceeding as if a statistical test already has shown that ATE\(^{\ast }_{global}\) actually is zero. Using the estimated global ATE to center the local ATE’s is the option we prefer, although that introduces another source of uncertainty. Fortunately, because the global ATE is estimated from the full set of observations, the amount of uncertainty introduced may be very small, especially compared to the uncertainty in the local ATE estimates. Still, one could address that small uncertainty with bootstrap extensions of the procedures we later provide.

  3. Although beyond the scope of this paper, there are also model based methods for finding and estimating heterogeneous treatment effects. Some come from economics (Angrist 2004). But statisticians can like models too. Thomas and Bornkamp (2016) specify a suite of models in which beyond an indicator for the intervention, one includes the main effects of covariates and interaction effects between the covariates and the intervention. Both the main effects and interaction effects can be subject to different transformations from model to model. Nonparametric regression (e.g., cubic regression splines) is one example. Statistical tests are then used to determine which interaction effects are retained. To deal with the multiplicity of models, the authors suggest model averaging or beginning with one all encompassing model, and undertaking dimension reduction using the lasso.

  4. Lipkovich and colleagues consider a related idea using test statistics instead (2011).

  5. One can also tune for a good bias-variance tradeoff by stopping the partitioning after a very small number of splits or by pruning.

  6. By “pre-defined,” one means that all of the potential regressors are defined before the data analysis begins. They are not constructed as part of the data analysis, which is precisely what recursive partitioning does (i.e., indicator variables defining splits). For regression, their method combines a powerful variable selection procedure with proper controls for both the family-wise error rate and the false discovery rate.

  7. Although this can be help if one is going to use the tree-structure for subject-matter explanations, less clear is whether it matters for estimates of local ATEs, especially given our preferred loss functions. For subsequent use, the data will be random IID realizations from the same joint probability distribution that includes the same variables measured in the same way.

  8. The connection to model selection becomes more apparent when one recalls that each split can can be represented by an indicator variable. For each possible split, one seeks to find the best indicator variable, discarding all others.

  9. For example, if there are 400 possible partitions for each pass through the data, a depth-1 partitioning has 400 possible partitions. A depth-2 partitioning as 160,000 possible partitions. A depth-3 partitioning has over 64 million possible partitions. And 400 depth-1 partitions is not at all extreme. Suppose a single categorical variable, such as the kind of crime for which a person was arrested, has 8 possible categories. There are 254 possible partitions from that covariate alone.

  10. An R-package is in preparation. All possible stumps are constructed. Then, for each permutation of the treatment indicator, the maximum or minimum t-value over stumps is saved. If there are 1000 permutations, 1000 maximum or minimum t-values over stumps are saved from which an empirical sampling distribution can be constructed.

  11. Products of numeric variables can be challenging to interpret because there will often be several different values of constituent covariates that have the same product. Binning can help. Then one can more easily define indicator variables to be used as covariates that define particular splits (e.g., women under 30 years of age with 12 years of education). It is important to keep in mind that there is no model, and the enterprise is not explanation. One is searching for identifiable subsets of study units with unusually large or small ATEs.

  12. This assumes that each of arm of the experiment is assigned with the same probability. If not, within each partition, the unequal assignment probabilities will be approximately reproduced.

  13. For minimum partition sizes of 150 and 200, the split values were substantially lower. Less troublesome offenders were being thrown into the mix.

  14. The interaction effect with gender, representing a tree depth of 2, was not selected.

  15. The absence of test data rules out a wide variety of other honest tests as well. For example, one could test for particular contrasts, such as between the largest and next largest heterogenous treatment effects.

References

  • Ahlman, L.C., & Kurtz, E.M. (2009). The APPD randomized controlled trial in low risk supervision: the effects on low risk supervision on rearrest. Adult Probation and Parole Department: Philadelphia.

    Google Scholar 

  • Angrist, J. (2004). Treatment effect heterogeneity in theory and practice. The Royal Economic Society Sargan Lecture, The Economic Journal, 114(494), C52–C83.

    Google Scholar 

  • Athey, S., & Imbens, G. (2016). Recursive partitioning for heterogeneous causal effects. Proceedings of The National Academy of Sciences, 112(27), 7353–7360.

    Article  Google Scholar 

  • Athey, S., Tibshirani, J., Wager, S. (2017). Generalized random forests. arXiv:1610.0127v3 [Stat.ME].

  • Baker, M. (2016). 1,500 Scientists lift the lid on reproducibility. Nature, 533, 452–454.

    Article  Google Scholar 

  • Barber, R.F., Candès, E.J., Samworth, R.J. (2018). Robust inference with knockoffs. arXiv:1801.03896v3 [stat.ME].

  • Berk, R.A. (2016). Statistical learning from a regression perspective, 2nd edn. New York: Springer.

    Book  Google Scholar 

  • Berk, R.A., Barnes, G., Alhman, L., Kurtz, E. (2010). When second best is good enough: a comparison between a true experiment and a regression discontinuity Quasi-Experiment. Journal of Experimental Criminology, 6(2), 191–208.

    Article  Google Scholar 

  • Berk, R.A., Brown, L., Buja, A., George, E., Zhang, K., Zhao, L. (2013). Valid post-selection inference. The Annals of Statistics, 41(2), 802–837.

    Article  Google Scholar 

  • Berk, R.A., Brown, L., Buja, A., George, E., Zhang, K., Zhao, L. (2014). Covariance adjustments for the analysis of randomized field experiments. Evaluation Review, 34(3-4), 170–196.

    Article  Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.

    Article  Google Scholar 

  • Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J. (1984). Classification and Regression Trees. Monterey: Wadsworth Press.

    Google Scholar 

  • Buja, A., & Lee, Y.-S. (2001). Data mining criteria for tree-based regression and classification. In: Proceedings of KDD, pp 27–36.

  • Berk, R.A., Brown, L., Buja, A., Zhang, K., Zhao, L. (2013). Valid post-selection inference. Annals of Statistics, 41(2), 401–1053.

    Article  Google Scholar 

  • Buja, A., Berk, R.A., Brown, L., George, E., Pitkin, E., Traskin, M., Zhao, L., Zhang, K. (2019a). Models as approximations, part I—a conspiracy of nonlinearity and random regressors in linear regression. Statistical Science, forthcoming with discussion.

  • Buja, A., Berk, R.A., Brown, L., George, E., Pitkin, E., Traskin, M., Zhao, L., Zhang, K. (2019b). Models as approximations, part II—a conspiracy of nonlinearity and random regressors in linear regression. Statistical Science, forthcoming with discussion.

  • Cox, D.R. (1958). Planning of Experiments, New York: Wiley.

  • Edgington, E.S., & Onghena, P. (2007). Randomization Tests. New York: Chapman & and Hall.

    Book  Google Scholar 

  • Fisher, R.A. (1935). The Design of Experiments. New York: Hafner Press.

    Google Scholar 

  • Foster, J.C., Taylor, J.M.G., Ruberg, S.J. (2011). Subgroup identification from randomized clinical trial data. Statistics in Medicine, 30, 2867–2880.

    Article  Google Scholar 

  • Hastie, T., Tibshirani, R., Friedman, J. (2009). The elements of statistical learning, 2nd edn. New York: Springer.

    Book  Google Scholar 

  • Hinkelmann, K, & Kempthorme, O. (2008). Design and analysis of experiments Vol. I. New York: Wiley.

    Google Scholar 

  • Holland, P. (1986). Statistics And Causal Inference (with discussion). Journal of the American Statistical Association, 81(396), 945–970.

    Article  Google Scholar 

  • Imbens, G.W. (2009). Better LATE than nothing: some comments on deaton (2009) and heckman and urzua (2009). Journal of Economic Literature, 48(2), 399–423.

    Article  Google Scholar 

  • Imbens, G.W., & Ruben, D. (2015). Casual Interference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge: Cambridge University Press. https://doi.org/10.1017/CB09781139025751.

  • Hothorn, T., Hornik, K., Zeileis, A. (2006). Unbiased recursive partitioning: a conditional inference framework. Journal of Computational and Graphical Statistics, 15(3), 651–674.

    Article  Google Scholar 

  • Johnson, V.E., Payne, R.D., Wang, T., Mandal, S. (2016). On the reproducibility of psychological science. Journal of the American Statistical Association, 112(517), 1–10.

    Article  Google Scholar 

  • Kass, G.V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(22), 119–127.

    Article  Google Scholar 

  • Kempthorme, O. (1955). The randomization theory of experimental inference. Journal of the American Statistical Association, 50, 946–967.

    Google Scholar 

  • Lee, J.D., Sun, D.L., Sun, Y., Taylor, J.E. (2016). Exact post-selection inference, with application to the lasso. The Annals of Statistics, 3, 907–927.

    Google Scholar 

  • Leeb, H., & Pötscher, B.M. (2005). Model selection and inference: facts and fiction. Econometric Theory, 21, 21–59.

    Article  Google Scholar 

  • Leeb, H., & Pötscher, B.M. (2006). Can one estimate the conditional distribution of post-model-selection estimators?. The Annals of Statistics, 34(5), 2554–2591.

    Article  Google Scholar 

  • Leeb, H., & Pötscher, B.M. (2008). Model selection. In Anderson, T.G., Davis, R.A., Kreib, J.-P., Mikosch, T. (Eds.) The Handbook of Financial Time Series (pp. 785–821). New York: Springer.

  • Lipkovich, I., Dmetrienko, A., Denne, J., Enas, G. (2011). Subgroup identification based on differential effect searce – a recursive partitioning method for establishing response to treatment in patient subpopulations. Statistics In Medicine, 30, 2601–2621.

    Article  Google Scholar 

  • Loh, W.-Y. (2011). Classification and regression trees, WIRE’s Data Mining and Discovery. New York: Springer.

    Google Scholar 

  • Loh, W.-Y., He, X., Man, M. (2015). A regression tree approach to identifying subgroups with differential treatment effects. Statistics In Medicine, 34, 1818–1833.

    Article  Google Scholar 

  • Meinshasen, N., & Meier, L. (2009). Bühlman p-values for high-dimensional regression. Journal of the American Statistical Association, 104(48), 1671–1681.

    Article  Google Scholar 

  • McNutt, M. (2014). Reproducibility. Science, 343(6168), 229.

    Article  Google Scholar 

  • Quinlan, J.R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.

    Google Scholar 

  • Rubin, D.B. (1986). Which ifs have causal answers?. Journal of the American Statistical Association, 81, 961–962.

    Google Scholar 

  • Splawa-Neyman, J., & et al. (1990). On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9. Statistical Science, 5(4), 465–472.

    Article  Google Scholar 

  • Su, X., Tsai, C.-L., Wang, H., Nickerson, D.M., Li, B. (2009). Subgroup analysis via recursive partitioning. Journal of Machine Learning Research, 10, 141–158.

    Google Scholar 

  • Swanson, S.A., Hernán, M., Miller, M., Robins, J.M. (2018). Richardson Partial identification of the average treatment effect using instrumental variables: review of methods for binary instruments, treatments, and outcomes. Journal of the American Statistical Association. https://doi.org/10.1080/01621459.2018.1434530.

  • Taylor, J.E., & Tibshirani, R.J. (2015). Statistical learning and selective inference. Proceedings of the National Academy of Sciences, 112(25), 7629–7634.

    Article  Google Scholar 

  • Thomas, M. , & Bornkamp, B. (2016). Comparing approaches to treatment effect estimation for subgroups in clinical trials. arXiv:1603:03316v2 [stat.CO].

  • Vivalt, E. (2015). Heterogeneous treatment effects in impact evaluation. American Economic Review: Papers & Proceedings, 105(5), 467–470.

    Article  Google Scholar 

  • Wager, S., & Athey, S. (2017). Estimation and inference of heterogeneous treatment effects using random forests. arXiv:1510.04342v4 [stat.ME].

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard Berk.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Berk, R., Olson, M., Buja, A. et al. Using recursive partitioning to find and estimate heterogenous treatment effects in randomized clinical trials. J Exp Criminol 17, 519–538 (2021). https://doi.org/10.1007/s11292-019-09410-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11292-019-09410-0

Keywords

Navigation