1932

Abstract

Education research has experienced a methodological renaissance over the past two decades, with a new focus on large-scale randomized experiments. This wave of experiments has made education research an even more exciting area for statisticians, unearthing many lessons and challenges in experimental design, causal inference, and statistics more broadly. Importantly, educational research and practice almost always occur in a multilevel setting, which makes the statistics relevant to other fields with this structure, including social policy, health services research, and clinical trials in medicine. In this article we first briefly review the history that led to this new era in education research and describe the design features that dominate the modern large-scale educational experiments. We then highlight some of the key statistical challenges in this area, including endogeneity of design, heterogeneity of treatment effects, noncompliance with treatment assignment, mediation, generalizability, and spillover. Though a secondary focus, we also touch on promising trial designs that answer more nuanced questions, such as the SMART design for studying dynamic treatment regimes and factorial designs for optimizing the components of an existing treatment.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-031219-041205
2020-03-07
2024-04-18
Loading full text...

Full text loading...

/deliver/fulltext/statistics/7/1/annurev-statistics-031219-041205.html?itemId=/content/journals/10.1146/annurev-statistics-031219-041205&mimeType=html&fmt=ahah

Literature Cited

  1. Allen JP, Pianta RC, Gregory A, Mikami AY, Lun J 2011. An interaction-based approach to enhancing secondary school instruction and student achievement. Science 333:60451034–37
    [Google Scholar]
  2. Almirall D, Kasari C, McCaffrey DF, Nahum-Shani I 2018. Developing optimized adaptive interventions in education. J. Res. Educ. Eff. 11:27–34
    [Google Scholar]
  3. Angrist JD, Cohodes SR, Dynarski SM, Pathak PA, Walters CR 2016. Stand and deliver: effects of Boston's charter high schools on college preparation, entry, and choice. J. Labor Econ. 34:2275–318
    [Google Scholar]
  4. Angrist JD, Imbens GW, Rubin DB 1996. Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 91:434444
    [Google Scholar]
  5. Angrist JD, Pathak PA, Walters CR 2013. Explaining charter school effectiveness. Am. Econ. J. Appl. Econ. 5:41–27
    [Google Scholar]
  6. Baron RM, Kenny DA. 1986. The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J. Personal. Soc. Psychol. 51:61173–82
    [Google Scholar]
  7. Basse GW, Feller A, Toulis P 2019. Randomization tests of causal effects under interference. Biometrika 106:2487–94
    [Google Scholar]
  8. Bell SH, Stuart EA. 2016. On the “where” of social experiments: the nature and extent of the generalizability problem. New Dir. Eval. 2016:15247–59
    [Google Scholar]
  9. Berger RL, Boos DD. 1994. P values maximized over a confidence set for the nuisance parameter. J. Am. Stat. Assoc. 89:4271012–16
    [Google Scholar]
  10. Bitler MP, Hoynes HW, Domina T 2014. Experimental evidence on distributional effects of Head Start NBER Work. Pap 20434
  11. Bloom HS, Raudenbush SW, Weiss MJ, Porter K 2017. Using multisite experiments to study cross-site variation in treatment effects: a hybrid approach with fixed intercepts and a random treatment coefficient. J. Res. Educ. Eff. 10:4817–42
    [Google Scholar]
  12. Bloom HS, Richburg-Hayes L, Black AR 2007. Using covariates to improve precision for studies that randomize schools to evaluate educational interventions. Educ. Eval. Policy Anal. 29:30–59
    [Google Scholar]
  13. Bloom HS, Unterman R. 2014. Can small high schools of choice improve educational prospects for disadvantaged students?. J. Policy Anal. Manag. 33:2290–319
    [Google Scholar]
  14. Bloom HS, Weiland C. 2015. Quantifying variation in Head Start effects on young children's cognitive and socio-emotional skills using data from the National Head Start Impact Study. SSRN Electron. J. http://dx.doi.org/10.2139/ssrn.2594430
    [Crossref] [Google Scholar]
  15. Borman GD, Dowling NM, Schneck C 2008. A multisite cluster randomized field trial of open court reading. Educ. Eval. Policy Anal. 30:4389–407
    [Google Scholar]
  16. Borman GD, Slavin RE, Cheung ACK, Chamberlain AM, Madden NA, Chambers B 2007. Final reading outcomes of the national randomized field trial of Success for All. Am. Educ. Res. J. 44:3701–31
    [Google Scholar]
  17. Boruvka A, Almirall D, Witkiewitz K, Murphy SA 2018. Assessing time-varying causal effect moderation in mobile health. J. Am. Stat. Assoc. 113:5231112–21
    [Google Scholar]
  18. Bound J, Jaeger DA, Baker RM 1995. Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. J. Am. Stat. Assoc. 90:430443–50
    [Google Scholar]
  19. Boyd-Zaharias J. 1999. Project STAR: the story of the Tennessee class-size study. Am. Educ. 23:230–36
    [Google Scholar]
  20. Bryk AS, Weisberg HI. 1976. Value-added analysis: a dynamic approach to the estimation of treatment effects. J. Educ. Stat. 1:2127–55
    [Google Scholar]
  21. Bullock JG, Green DP, Ha SE 2010. Yes, but what's the mechanism? (Don't expect an easy answer). J. Personal. Soc. Psychol. 98:4550–58
    [Google Scholar]
  22. Chakraborty B, Murphy SA. 2014. Dynamic treatment regimes. Annu. Rev. Stat. Appl. 1:447–64
    [Google Scholar]
  23. Chipman HA, George EI, McCulloch RE 2010. BART: Bayesian additive regression trees. Ann. Appl. Stat. 4:266–98
    [Google Scholar]
  24. Clark MA, Gleason PM, Tuttle CC, Silverberg MK 2015. Do charter schools improve student achievement?. Educ. Eval. Policy Anal. 37:4419–36
    [Google Scholar]
  25. Cochran WG. 1977. Sampling Techniques New York: Wiley. , 3rd ed..
  26. Collins LM. 2018. Optimization of Behavioral, Biobehavioral, and Biomedical Interventions: The Multiphase Optimization Strategy (MOST) New York: Springer
  27. Collins LM, Kugler KC, eds. 2018. Optimization of Behavioral, Biobehavioral, and Biomedical Interventions: Advanced Topics New York: Springer
  28. Confrey J. 2006. Comparing and contrasting the National Research Council report on evaluating curricular effectiveness with the What Works Clearinghouse approach. Educ. Eval. Policy Anal. 28:3195–213
    [Google Scholar]
  29. Cook TD. 2002. Randomized experiments in educational policy research: a critical examination of the reasons the educational evaluation community has offered for not doing them. Educ. Eval. Policy Anal. 24:3175–99
    [Google Scholar]
  30. Dahabreh IJ, Robertson SE, Tchetgen EJ, Stuart EA, Hernán MA 2019. Generalizing causal inferences from individuals in randomized trials to all trial‐eligible individuals. Biometrics 75:2685–94
    [Google Scholar]
  31. D'Amour A, Ding P, Feller A, Lei L, Sekhon J 2017. Overlap in observational studies with high-dimensional covariates. arXiv:1711.02582 [math.ST]
  32. Deaton A, Cartwright N. 2018. Understanding and misunderstanding randomized controlled trials. Soc. Sci. Med. 210:2–21
    [Google Scholar]
  33. Deming WE, Stephan FF. 1941. On the interpretation of censuses as samples. J. Am. Stat. Assoc. 36:21345–49
    [Google Scholar]
  34. Ding P, Feller A, Miratrix L 2016. Randomization inference for treatment effect variation. J. R. Stat. Soc. B 78:3655–71
    [Google Scholar]
  35. Ding P, Feller A, Miratrix L 2019. Decomposing treatment effect variation. J. Am. Stat. Assoc. 114:525304–17
    [Google Scholar]
  36. Ding P, Li X, Miratrix LW 2017. Bridging finite and super population causal inference. J. Causal Inference 5:220160027
    [Google Scholar]
  37. Dobbie W, Fryer RG. 2013. Getting beneath the veil of effective schools: evidence from New York City. Am. Econ. J. Appl. Econ. 5:428–60
    [Google Scholar]
  38. Donner A, Birkett N, Buck C 1981. Randomization by cluster: sample size requirements and analysis. Am. J. Epidemiol. 114:6906–14
    [Google Scholar]
  39. Dorie V, Hill J, Shalit U, Scott M, Cervone D 2019. Automated versus do-it-yourself methods for causal inference: lessons learned from a data analysis competition. Stat. Sci. 34:43–68
    [Google Scholar]
  40. Duncan GJ, Morris PA, Rodrigues C 2011. Does money really matter? Estimating impacts of family income on young children's achievement with data from random-assignment experiments. Dev. Psychol. 47:51263–79
    [Google Scholar]
  41. Duncan OD. 1966. Path analysis: sociological examples. Am. J. Sociol. 72:1–16
    [Google Scholar]
  42. Dziak JJ, Nahum-Shani I, Collins LM 2012. Multilevel factorial experiments for developing behavioral interventions: power, sample size, and resource considerations. Psychol. Methods 17:2153–75
    [Google Scholar]
  43. Feller A, Grindal T, Miratrix L, Page LC 2016. Compared to what? Variation in the impacts of early childhood education by alternative care type. Ann. Appl. Stat. 10:31245–85
    [Google Scholar]
  44. Finn JD, Achilles CM. 1990. Answers and questions about class size: a statewide experiment. Am. Educ. Res. J. 27:3557–77
    [Google Scholar]
  45. Fortmann SP, Flora JA, Winkleby MA, Schooler C, Taylor CB, Farquhar JW 1995. Community intervention trials: reflections on the Stanford Five-City Project experience. Am. J. Epidemiol. 142:6576–86
    [Google Scholar]
  46. Frangakis CE, Rubin DB. 2002. Principal stratification in causal inference. Biometrics 58:21–29
    [Google Scholar]
  47. Graubard BI, Korn EL. 2002. Inference for superpopulation parameters using sample surveys. Stat. Sci. 17:73–96
    [Google Scholar]
  48. Green DP, Kern HL. 2012. Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees. Public Opin. Q. 76:3491–511
    [Google Scholar]
  49. Guo W, Ji Y, Catenacci DVT 2017. A subgroup cluster-based Bayesian adaptive design for precision medicine: SCUBA. Biometrics 73:2367–77
    [Google Scholar]
  50. Haavelmo T. 1943. The statistical implications of a system of simultaneous equations. Econometrica 11:1–12
    [Google Scholar]
  51. Hartley HO, Sielken RL. 1975. A “super-population viewpoint' for finite population sampling. Biometrics 31:2411–22
    [Google Scholar]
  52. Hassrick EM, Raudenbush SW, Rosen LS 2017. The Ambitious Elementary School: Its Conception, Design, and Implications for Educational Equality Chicago: Univ. Chicago Press
  53. Heckman JJ. 1979. Sample selection bias as a specification error. Econometrica 47:153
    [Google Scholar]
  54. Heckman JJ, Robb R. 1985. Alternative methods for evaluating the impact of interventions. J. Econom. 30:1–2239–67
    [Google Scholar]
  55. Heckman JJ, Schmierer D, Urzua S 2010. Testing the correlated random coefficient model. J. Econom. 158:2177–203
    [Google Scholar]
  56. Heckman JJ, Vytlacil E. 2001. Policy-relevant treatment effects. Am. Econ. Rev. 91:2107–11
    [Google Scholar]
  57. Hedges LV, Hedberg EC. 2013. Intraclass correlations and covariate outcome correlations for planning two- and three-level cluster-randomized experiments in education. Eval. Rev. 37:6445–89
    [Google Scholar]
  58. Hernán MA, VanderWeele TJ. 2011. Compound treatments and transportability of causal inference. Epidemiology 22:3368–77
    [Google Scholar]
  59. Hill JL. 2011. Bayesian nonparametric modeling for causal inference. J. Comput. Graph. Stat. 20:217–40
    [Google Scholar]
  60. Hill JL, Su Y-S. 2013. Assessing lack of common support in causal inference using Bayesian nonparametrics: implications for evaluating the effect of breastfeeding on children's cognitive outcomes. Ann. Appl. Stat. 7:31386–1420
    [Google Scholar]
  61. Holland PW. 1986. Statistics and causal inference. J. Am. Stat. Assoc. 81:396945–60
    [Google Scholar]
  62. Holland PW. 1988. Causal inference, path analysis, and recursive structural equations models. Sociol. Methodol. 18:449–84
    [Google Scholar]
  63. Hong G. 2015. Causality in a Social World: Moderation, Mediation and Spill-over New York: Wiley
  64. Hong G, Raudenbush SW. 2006. Evaluating kindergarten retention policy: a case study of causal inference for multilevel observational data. J. Am. Stat. Assoc. 101:475901–10
    [Google Scholar]
  65. Hong G, Raudenbush SW. 2008. Causal inference for time-varying instructional treatments. J. Educ. Behav. Stat. 33:3333–62
    [Google Scholar]
  66. Hong G, Raudenbush SW. 2013. Heterogeneous agents, social interactions, and causal inference. Handbook of Causal Analysis for Social Research SL Morgan 331–52 New York: Springer
    [Google Scholar]
  67. Hudgens MG, Halloran ME. 2008. Toward causal inference with interference. J. Am. Stat. Assoc. 103:482832–42
    [Google Scholar]
  68. Huebschmann AG, Leavitt IM, Glasgow RE 2019. Making health research matter: a call to increase attention to external validity. Annu. Rev. Public Health 40:45–63
    [Google Scholar]
  69. Imai K, Keele L, Tingley D 2010. A general approach to causal mediation analysis. Psychol. Methods 15:4309–31
    [Google Scholar]
  70. Imai K, Ratkovic M. 2013. Estimating treatment effect heterogeneity in randomized program evaluation. Ann. Appl. Stat. 7:443–70
    [Google Scholar]
  71. Imbens GW, Rubin DB. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction Cambridge, UK: Cambridge Univ. Press. , 1st ed..
  72. Kennedy-Martin T, Curtis S, Faries D, Robinson S, Johnston J 2015. A literature review on the representativeness of randomized controlled trial samples and implications for the external validity of trial results. Trials 16:495
    [Google Scholar]
  73. Kern HL, Stuart EA, Hill J, Green DP 2016. Assessing methods for generalizing experimental impact estimates to target populations. J. Res. Educ. Eff. 9:103–27
    [Google Scholar]
  74. Kilbourne AM, Abraham KM, Goodrich DE, Bowersox NW, Almirall D et al. 2013. Cluster randomized adaptive implementation trial comparing a standard versus enhanced implementation intervention to improve uptake of an effective re-engagement program for patients with serious mental illness. Implement. Sci. 8:136
    [Google Scholar]
  75. Kilbourne AM, Smith SN, Choi SY, Koschmann E, Liebrecht C et al. 2018. Adaptive school-based implementation of CBT (ASIC): clustered-SMART for building an optimized adaptive implementation intervention to improve uptake of mental health interventions in schools. Implement. Sci. 13:119
    [Google Scholar]
  76. Kim JS, Asher CA, Burkhauser M, Mesite L, Leyva D 2019. Using a sequential multiple assignment randomized trial (SMART) to develop an adaptive K–2 literacy intervention with personalized print texts and app-based digital activities. AERA Open https://doi.org/10.1177/2332858419872701
    [Crossref] [Google Scholar]
  77. Klar N, Donner A. 1997. The merits of matching in community intervention trials: a cautionary tale. Stat. Med. 16:151753–64
    [Google Scholar]
  78. Klasnja P, Hekler EB, Shiffman S, Boruvka A, Almirall D et al. 2015. Microrandomized trials: an experimental design for developing just-in-time adaptive interventions. Health Psychol 34:Suppl.1220–28
    [Google Scholar]
  79. Kline P, Walters CR. 2019. On Heckits, LATE, and numerical equivalence. Econometrica 87:2677–96
    [Google Scholar]
  80. Kling JR, Liebman JB, Katz LF 2007. Experimental analysis of neighborhood effects. Econometrica 75:83–119
    [Google Scholar]
  81. Krueger AB, Whitmore DM. 2001. The effect of attending a small class in the early grades on college‐test taking and middle school test results: evidence from Project Star. Econ. J. 111:4681–28
    [Google Scholar]
  82. Laber EB, Lizotte DJ, Qian M, Pelham WE, Murphy SA 2014. Dynamic treatment regimes: technical challenges and applications. Electron. J. Stat. 8:1225–72
    [Google Scholar]
  83. Lei H, Nahum-Shani I, Lynch K, Oslin D, Murphy SA 2012. A “SMART” design for building individualized treatment sequences. Annu. Rev. Clin. Psychol. 8:21–48
    [Google Scholar]
  84. Little RJ, Rubin DB. 2002. Statistical Analysis with Missing Data New York: Wiley, 2nd ed..
  85. Louis TA. 1984. Estimating a population of parameter values using Bayes and empirical Bayes methods. J. Am. Stat. Assoc. 79:386393
    [Google Scholar]
  86. Luedtke AR, van der Laan MJ 2016. Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. Ann. Stat. 44:2713–42
    [Google Scholar]
  87. McClure JB, Derry H, Riggs KR, Westbrook EW, St. John J et al. 2012. Questions about quitting (Q2): design and methods of a Multiphase Optimization Strategy (MOST) randomized screening experiment for an online, motivational smoking cessation intervention. Contemp. Clin. Trials 33:51094–102
    [Google Scholar]
  88. McLaughlin MW. 1987. Learning from experience: lessons from policy implementation. Educ. Eval. Policy Anal. 9:2171
    [Google Scholar]
  89. Mosteller F. 1995. The Tennessee study of class size in the early school grades. Future Child 5:2113
    [Google Scholar]
  90. Mosteller F, Boruch RF, eds. 2002. Evidence Matters: Randomized Trials in Education Research Washington, DC: Brookings Inst.
  91. Murphy SA. 2005. An experimental design for the development of adaptive treatment strategies. Stat. Med. 24:101455–81
    [Google Scholar]
  92. Murray DM. 1995. Design and analysis of community trials: lessons from the Minnesota Heart Health Program. Am. J. Epidemiol. 142:6569–75
    [Google Scholar]
  93. NeCamp T, Kilbourne A, Almirall D 2017. Comparing cluster-level dynamic treatment regimens using sequential, multiple assignment, randomized trials: regression estimation and sample size considerations. Stat. Methods Med. Res. 26:41572–89
    [Google Scholar]
  94. Neyman J. 1935. Statistical problems in agricultural experimentation. Suppl. J. R. Stat. Soc. 2:2107
    [Google Scholar]
  95. Nguyen TQ, Ebnesajjad C, Cole SR, Stuart EA 2017. Sensitivity analysis for an unobserved moderator in RCT-to-target-population generalization of treatment effects. Ann. Appl. Stat. 11:225–47
    [Google Scholar]
  96. Nomi T, Allensworth E. 2009. “Double-dose” algebra as an alternative strategy to remediation: effects on students’ academic outcomes. J. Res. Educ. Eff. 2:2111–48
    [Google Scholar]
  97. Nomi T, Raudenbush SW. 2016. Making a success of “Algebra for All”: the impact of extended instructional time and classroom peer skill in Chicago. Educ. Eval. Policy Anal. 38:2431–51
    [Google Scholar]
  98. O'Muircheartaigh C, Hedges LV. 2014. Generalizing from unrepresentative experiments: a stratified propensity score approach. J. R. Stat. Soc. C 63:2195–210
    [Google Scholar]
  99. Paddock SM, Ridgeway G, Lin R, Louis TA 2006. Flexible distributions for triple-goal estimates in two-stage hierarchical models. Comput. Stat. Data Anal. 50:113243–62
    [Google Scholar]
  100. Pearl J. 2001. Direct and indirect effects. Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence J Breese, D Koller 411–20 San Francisco: Morgan Kaufmann
    [Google Scholar]
  101. Pearl J, Bareinboim E. 2014. External validity: from do-calculus to transportability across populations. Stat. Sci. 29:4579–95
    [Google Scholar]
  102. Pellegrini CA, Hoffman SA, Collins LM, Spring B 2014. Optimization of remotely delivered intensive lifestyle treatment for obesity using the Multiphase Optimization Strategy: Opt-IN study protocol. Contemp. Clin. Trials 38:2251–59
    [Google Scholar]
  103. Petersen ML, Sinisi SE, van der Laan MJ 2006. Estimation of direct causal effects. Epidemiology 17:3276–84
    [Google Scholar]
  104. Puma M, Bell S, Cook R, Heid C, Shapiro G et al. 2010. Head Start impact study final report Rep US Dep. Health Hum. Serv Washington, DC:
  105. Qin X, Hong G, Deutsch J, Bein E 2019. Multisite causal mediation analysis in the presence of complex sample and survey designs and non‐random non‐response. J. R. Stat. Soc. A. https://doi.org/10.1111/rssa.12446
    [Crossref] [Google Scholar]
  106. Raudenbush SW. 1997. Statistical analysis and optimal design for cluster randomized trials. Psychol. Methods 2:2173–85
    [Google Scholar]
  107. Raudenbush SW. 2008. Advancing educational policy by advancing research on instruction. Am. Educ. Res. J. 45:1206–30
    [Google Scholar]
  108. Raudenbush SW, Bloom HS. 2015. Learning about and from a distribution of program impacts using multisite trials. Am. J. Eval. 36:4475–99
    [Google Scholar]
  109. Raudenbush SW, Hernandez M, Goldin-Meadow S, Carrazza C, Leslie D et al. 2020. Longitudinally adaptive instruction increases the numerical skill of pre-school children. Work. Pap., Dep. Psychol., Univ. Chicago
    [Google Scholar]
  110. Raudenbush SW, Martinez A, Spybrook J 2007. Strategies for improving precision in group-randomized experiments. Educ. Eval. Policy Anal. 29:5–29
    [Google Scholar]
  111. Raudenbush SW, Reardon SF, Nomi T 2012. Statistical analysis for multisite trials using instrumental variables with random coefficients. J. Res. Educ. Eff. 5:3303–32
    [Google Scholar]
  112. Raudenbush SW, Schwartz D. 2019. Estimating the average treatment effect in a multisite trial with heterogeneous treatment effects Work. Pap Univ Chicago:
  113. Reardon SF, Raudenbush SW. 2013. Under what assumptions do site-by-treatment instruments identify average causal effects?. Sociol. Methods Res. 42:2143–63
    [Google Scholar]
  114. Reardon SF, Unlu F, Zhu P, Bloom HS 2014. Bias and bias correction in multisite instrumental variables analysis of heterogeneous mediator effects. J. Educ. Behav. Stat. 39:53–86
    [Google Scholar]
  115. Rosenbaum PR, Rubin DB. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55
    [Google Scholar]
  116. Rothman KJ, Gallacher JE, Hatch EE 2013. Why representativeness should be avoided. Int. J. Epidemiol. 42:41012–14
    [Google Scholar]
  117. Rubin DB. 1978. Bayesian inference for causal effects: the role of randomization. Ann. Stat. 6:34–58
    [Google Scholar]
  118. Rubin DB. 1981. Estimation in parallel randomized experiments. J. Educ. Stat. 6:4377–401
    [Google Scholar]
  119. Rubin DB. 1986. Comment: which ifs have causal answers. J. Am. Stat. Assoc. 81:396961–62
    [Google Scholar]
  120. Schochet P. 2015. Statistical theory for the RCT-YES software: design-based causal inference for RCTs Rep., Natl Cent. Educ. Eval. Reg. Assist., US Dep. Educ.
  121. Shadish WR, Cook TD, Campbell DT 2002. Experimental and Quasi-Experimental Designs for Generalized Causal Inference Boston: Houghton Mifflin
  122. Shen W, Louis TA. 1998. Triple-goal estimates in two-stage hierarchical models. J. R. Stat. Soc. B 60:2455–71
    [Google Scholar]
  123. Sobel ME. 2006. What do randomized studies of housing mobility demonstrate? Causal inference in the face of interference. J. Am. Stat. Assoc. 101:4761398–1407
    [Google Scholar]
  124. Spybrook J. 2014. Detecting intervention effects across context: an examination of the precision of cluster randomized trials. J. Exp. Educ. 82:3334–57
    [Google Scholar]
  125. Spybrook J, Shi R, Kelcey B 2016. Progress in the past decade: an examination of the precision of cluster randomized trials funded by the U.S. Institute of Education Sciences. Int. J. Res. Method Educ. 39:3255–67
    [Google Scholar]
  126. Stuart EA, Bell SH, Ebnesajjad C, Olsen RB, Orr LL 2017. Characteristics of school districts that participate in rigorous national educational evaluations. J. Res. Educ. Eff. 10:168–206
    [Google Scholar]
  127. Stuart EA, Cole SR, Bradshaw CP, Leaf PJ 2011. The use of propensity scores to assess the generalizability of results from randomized trials: use of propensity scores to assess generalizability. J. R. Stat. Soc. A 174:2369–86
    [Google Scholar]
  128. Tipton E. 2013. Improving generalizations from experiments using propensity score subclassification: assumptions, properties, and contexts. J. Educ. Behav. Stat. 38:3239–66
    [Google Scholar]
  129. Tipton E. 2014. How generalizable is your experiment? An index for comparing experimental samples and populations. J. Educ. Behav. Stat. 39:6478–501
    [Google Scholar]
  130. Tipton E, Hallberg K, Hedges LV, Chan W 2017. Implications of small samples for generalization: adjustments and rules of thumb. Eval. Rev. 41:5472–505
    [Google Scholar]
  131. Tipton E, Hedges L, Vaden-Kiernan M, Borman G, Sullivan K, Caverly S 2014. Sample selection in randomized experiments: a new method using propensity score stratified sampling. J. Res. Educ. Eff. 7:114–35
    [Google Scholar]
  132. Tipton E, Peck LR. 2017. A design-based approach to improve external validity in welfare policy evaluations. Eval. Rev. 41:4326–56
    [Google Scholar]
  133. VanderWeele TJ. 2015. Explanation in Causal Inference: Methods for Mediation and Interaction Oxford, UK: Oxford Univ. Press
  134. VanderWeele TJ, Hong G, Jones SM, Brown JL 2013. Mediation and spillover effects in group-randomized trials: a case study of the 4Rs educational intervention. J. Am. Stat. Assoc. 108:502469–82
    [Google Scholar]
  135. Wager S, Athey S. 2018. Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 113:5231228–42
    [Google Scholar]
  136. Wahed AS, Tsiatis AA. 2004. Optimal estimator for the survival distribution and related quantities for treatment policies in two‐stage randomization designs in clinical trials. Biometrics 60:124–33
    [Google Scholar]
  137. Wald A. 1940. The fitting of straight lines if both variables are subject to error. Ann. Math. Stat. 11:3284–300
    [Google Scholar]
  138. Walters CR. 2015. Inputs in the production of early childhood human capital: evidence from head start. Am. Econ. J. Appl. Econ. 7:476–102
    [Google Scholar]
  139. Wang R, Ware JH. 2013. Detecting moderator effects using subgroup analyses. Prev. Sci. 14:2111–20
    [Google Scholar]
  140. Weiss MJ, Bloom HS, Verbitsky-Savitz N, Gupta H, Vigil AE, Cullinan DN 2017. How much do the effects of education and training programs vary across sites? Evidence from past multisite randomized trials. J. Res. Educ. Eff. 10:4843–76
    [Google Scholar]
  141. Westreich D, Edwards JK, Lesko CR, Stuart E, Cole SR 2017. Transportability of trial results using inverse odds of sampling weights. Am. J. Epidemiol. 186:81010–14
    [Google Scholar]
  142. Wright S. 1921. Correlation and causation. J. Agric. Res. 20:7557–85
    [Google Scholar]
  143. Wyrick DL, Rulison KL, Fearnow-Kenney M, Milroy JJ, Collins LM 2014. Moving beyond the treatment package approach to developing behavioral interventions: addressing questions that arose during an application of the Multiphase Optimization Strategy (MOST). Transl. Behav. Med. 4:3252–59
    [Google Scholar]
  144. Zajonc T. 2012. Bayesian inference for dynamic treatment regimes: mobility, equity, and efficiency in student tracking. J. Am. Stat. Assoc. 107:49780–92
    [Google Scholar]
  145. Zhelonkin M, Genton MG, Ronchetti E 2016. Robust inference in sample selection models. J. R. Stat. Soc. B 78:4805–27
    [Google Scholar]
/content/journals/10.1146/annurev-statistics-031219-041205
Loading
/content/journals/10.1146/annurev-statistics-031219-041205
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error