Skip to main content

Advertisement

Log in

Dude, Where’s My Treatment Effect? Errors in Administrative Data Linking and the Destruction of Statistical Power in Randomized Experiments

  • Original Paper
  • Published:
Journal of Quantitative Criminology Aims and scope Submit manuscript

Abstract

Objective

The increasing availability of large administrative datasets has led to an exciting innovation in criminal justice research—using administrative data to measure experimental outcomes in lieu of costly primary data collection. We demonstrate that this type of randomized experiment can have an unfortunate consequence: the destruction of statistical power. Combining experimental data with administrative records to track outcomes of interest typically requires linking datasets without a common identifier. In order to minimize mistaken linkages, researchers often use stringent linking rules like “exact matching” to ensure that speculative matches do not lead to errors in an analytic dataset. We show that this, seemingly conservative, approach leads to underpowered experiments, leaves real treatment effects undetected, and can therefore have profound implications for entire experimental literatures.

Methods

We derive an analytic result for the consequences of linking errors on statistical power and show how the problem varies across combinations of relevant inputs, including linking error rate, outcome density and sample size.

Results

Given that few experiments are overly well-powered, even small amounts of linking error can have considerable impact on Type II error rates. In contrast to exact matching, machine learning-based probabilistic matching algorithms allow researchers to recover a considerable share of the statistical power lost under stringent data-linking rules.

Conclusion

Our results demonstrate that probabilistic linking substantially outperforms stringent linking criteria. Failure to implement linking procedures designed to reduce linking errors can have dire consequences for subsequent analyses and, more broadly, for the viability of this type of experimental research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. For important reviews of the limitations of experimental research especially with respect to external validity, see Berk (2005), Deaton (2010), Heckman and Smith (1995) and Sampson (2010). Also see Nagin and Sampson (2019) for a wonderfully nuanced and equally important discussion of the inherent challenges in identifying a policy-relevant counterfactual in an experimental design. For reviews of the ethical and legal considerations that are attendant in randomized experiments, we refer readers to thoughtful reviews by Boruch et al. (2000) and Weisburd (2003).

  2. Formerly known as the Laura and John Arnold Foundation.

  3. Statistical power is, in large part a function of the available sample size but also depends on the amount of variation in the treatment and outcome variables. As was noted by Weisburd et al. (1993) some twenty-five years ago and noted recently by Nelson et al. (2015), small N studies are not necessarily more poorly powered than larger N studies empirically though, other things equal, this will be the case.

  4. Concerns over the misuse of researcher degrees of freedom and specification searching have likewise spurred recommendations which include the use of very small \(\alpha\) levels (Benjamin et al. 2018), which increases the probability of Type II errors even more.

  5. A second advantage of administrative data is that it avoids the inherent challenges involved in working with self reported data (Bertrand and Mullainathan 2001), which is not to minimize the fact that there are certainly trade-offs to using administrative data relative to self-reports. For empirical evaluations of the validity of self-reported data see: Lauritsen (1999), Morris and Slocum (2010) and Roberts and Wells (2010) among others.

  6. From 2017-present, approximately 16% of the field experiments published in Criminology, Journal of Quantitative Criminology, Journal of Experimental Criminology, Journal of Research in Crime and Delinquency and Justice Quarterly have taken the general form of a “low-cost” RCT.

  7. When a unique identifier is available in all of the datasets that require linking and the data are of sufficient quality, linking can, in some cases, be fairly trivial. These types of cross-system unique identifiers are frequently available in Scandinavian countries (e.g. Black et al. 2005; Dahl et al. 2014; Lyngstad and Skardhamar 2011; Wildeman and Andersen 2017) and occationally in South America.

  8. A large literature considers the implications that measurement error can have for econometric models but, to our knowledge, there is considerably less formal guidance with respect to how bad data linking can confound randomized experiments. It is also worth noting that when scholars need to link datasets without a common identifier there is no “ground truth” to assess the quality of the match. Likewise, there is often no prior about what the match rate should be, rendering it difficult to diagnose whether the matching procedure employed is sufficient or not.

  9. For narrative clarity, we limit our discussion to the linking of data containing records on persons. This discussion would extend to groups or firms, but the characteristics available for linking might be different.

  10. We acknowledge that biometric data are susceptible to misidentification as well. However, the literature generally considers linking using biometric indicators as more accurate than the text-based demographic identifiers that we discuss below (Watson et al. 2014).

  11. Operationally, however, the end result of most probabilistic linking processes requires the imposition of a deterministic threshold to define potential pairs as links, non-links or, in some cases, potential links.

  12. In the Fellegi-Sunter framework, pairs of records are compared across their identifying characteristics (name, date of birth, gender, etc) and a comparison vector is computed which encodes whether, for example, the names in the two records are the same, the name in one record is missing, the date of births are the same, and so on. Other extensions to this framework include string distance calculations between names (e.g. levenshtein, jarowinkler, etc.) or phonetic computation (e.g., Soundex, Double Metaphone, etc.).

  13. In particular, the aim is to estimate the difference, possibly conditioned on covariates, in means between treatment and control groups in a randomized control trial. In a related paper, Moore et al. (2014) explore the impact of matching errors on the relative risk ratio. Matching errors bias these two quantities in different ways. As we show below, false positive and false negative rates have equal impact on bias in our scenario. Moore et al. show that false positive rates are more influential on the bias in relative risk ratio estimates.

  14. Recent specific examples from throughout the social sciences include program participation in Supplemental Nutritional Assistance Program (Courtemanche et al. 2018), employment prevalence measured through unemployment insurance wage records (Johnston and Mas 2018), injuries measured through hospitalization data (Powell and Seabury 2018), or financial health measured through bankruptcy or liens (Dobkin et al. 2018).

  15. One scenario where this assumption would not hold is if both linking errors and treatment effects vary by one or more subgroups. In the event that both treatment effects and false linkage rates both vary by subgroup, the solution is slightly more complex and is explored in "Appendix 5".

  16. If \(TPR = FPR\) then the previous equation is undefined and the observed treatment effect will equal zero, but that situation is unlikely to occur in practice as it implies a random match.

  17. Ioannidis et al. (2017) show that the median statistical power for a large body of studies in economics, most of them observational, is just 18%.

  18. Here, \({\tau _h}\) refers to the candidate treatment effect for which statistical power will be computed. For smaller samples, \({{\Phi }}\) would be replaced by the cumulative distribution function for the t distribution.

  19. The computational details of this exercise are described in Appendix "1" to this paper.

  20. The parameter r captures the strength of the relationship between X and \(y_{C}^{*}\). Therefore, as r increases in magnitude, statistical power increases, both in the absence and the presence of matching errors. However, the relative gain statistical power is slightly larger when we do not condition on X. Across the parameterizations we examine, in the absence of a covariate, the average loss of power under matching errors is 8.4%. When r = 0.1, the loss of power is 8.8% when X is conditioned on. When r = 0.3, the average loss of power under matching errors is 11.9% when X is conditioned on. Hence while a larger r is uniformly power enhancing, it does mean that controlling for a covariate will be slightly less helpful in maximizing statistical power than it otherwise would be.

  21. The probabilistic matching approaches we deploy in this section also take advantage of the latest advancements in the field of machine learning for two primary reasons. First, administrative datasets often span hundreds of thousands and often millions of records. Probabilistic techniques involve computing similarity metrics across a number of identifying characteristics such as name and date of birth. It becomes prohibitively, computationally expensive to perform these calculations for each potential record pair as the administrative dataset size grows. Ideally, we would only perform these computations for records with high prior probability of referring to the same person. Techniques for detecting approximate nearest neighbors (Sadosky et al. 2015) allow for fast detection of likely matches that drastically reduce the number of comparisons that need to be made in the linking process. Second, the adaptivity of machine learning models for learning non-linear functions and the practice of assessing performance on out-of-sample data lead to predictive accuracy that outperforms linear models such as logistic regression.

  22. There are situations where the two rows in the dataset will match on all relevant variables save for the unique identifier. As it is ambiguous whether these rows refer to different individuals or if there is an error in the unique identifier, we drop these records from the empirical simulation. This reduces the number of records to 2.6 million.

  23. https://github.com/dedupeio/dedupe.

  24. RecordLink works by identifying potential matches across the two data sets and asking for human labels for pairs which the algorithm is most uncertain about. This information is then incorporated into the learning algorithm to improve predictions. A user providing labels has the option to stop at any point and have dedupe produce predictions based on the current version of the algorithm. To simulate a human providing responses, we modified dedupe’s code so that ground truth labels would be provided until either of the following conditions was met: the number of labels provided was equal to 50% of the experimental data set size, or the number of labels which identified a true positive links was greater than or equal to 75% of the number of true matches. For 95% of simulations, the number of labels provided was greater than 50% of the experimental data set size, and in 15% of simulations the number of labels provided was greater than 75% of the experimental data set size.

  25. Further details of the algorithm to appear in Jelveh and McNeill (2018).

  26. While ground truth data for record linkage is often hard to come by, in the context of low-cost RCTs it may actually be likely that the administrative data set being linked to will meet the conditions needed to deploy a supervised approach. In particular, the conditions that are needed for supervised learning are that the administrative data set contains a unique identifier (such as an agency identifier assigned by a police department, public hospital, or school system) and that a person can appear multiple times in the data set with the same unique identifier but with discrepancies between records in identifying characteristics.

  27. We note that in a very small number of parameterizations, the share of errors overturned is negative indicating that exact matching leads to fewer linking errors than active learning. Common features of these parameterizations include low exact matching error rates, low overlap between the experimental and administrative datasets, and/or larger administrative datasets.

  28. It is worthwhile to note that the descriptive consequences of linking error cannot be resolved by increasing sample size.

References

  • Abramitzky R, Boustan LP, Eriksson K, Feigenbaum JJ, Pérez S (2019, May). Automated linking of historical data. Working paper 25825, National Bureau of Economic Research

  • Aigner DJ (1973) Regression with a binary independent variable subject to errors of observation. J Econom 1(1):49–59

    Article  Google Scholar 

  • Angrist JD, Pischke J-S (2009) Mostly harmless econometrics. Princeton University Press, Princeton, NJ

    Book  Google Scholar 

  • Arceneaux K, Gerber AS, Green DP (2010) A cautionary note on the use of matching to estimate causal effects: an empirical example comparing matching estimates to an experimental benchmark. Sociol Methods Res 39(2):256–282

    Article  Google Scholar 

  • Bailey M, Cole C, Henderson M, Massey C (2017, November) How well do automated methods perform in historical samples? Evidence from new ground truth. Working paper 24019, National Bureau of Economic Research

  • Banerjee AV, Duflo E (2009) The experimental approach to development economics. Annu Rev Econ 1(1):151–178

    Article  Google Scholar 

  • Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers EJ, Berk R, Bollen KA, Brembs B, Brown L, Camerer C, Cesarini D, Chambers CD, Clyde M, Cook TD, De Boeck P, Dienes Z, Dreber A, Easwaran K, Efferson C, Fehr E, Fidler F, Field AP, Forster M, George EI, Gonzalez R, Goodman S, Green E, Green DP, Greenwald AG, Hadfield JD, Hedges LV, Held L, Hua Ho T, Hoijtink H, Hruschka DJ, Imai K, Imbens G, Ioannidis JPA, Jeon M, Jones JH, Kirchler M, Laibson D, List J, Little R, Lupia A, Machery E, Maxwell SE, McCarthy M, Moore DA, Morgan SL, Munafó M, Nakagawa S, Nyhan B, Parker TH, Pericchi L, Perugini M, Rouder J, Rousseau J, Savalei V, Schönbrodt FD, Sellke T, Sinclair B, Tingley D, Van Zandt T, Vazire S, Watts DJ, Winship C, Wolpert RL, Xie Y, Young C, Zinman J, Johnson VE (2018) Redefine statistical significance. Nat Hum Behav 2(1):6–10

    Article  Google Scholar 

  • Berent MK, Krosnick JA, Lupia A (2016) Measuring voter registration and turnout in surveys: Do official government records yield more accurate assessments? Public Opin Quart 80(3):597–621

    Article  Google Scholar 

  • Berk RA (2005) Randomized experiments as the bronze standard. J Exp Criminol 1(4):417–433

    Article  Google Scholar 

  • Bertrand M, Mullainathan S (2001) Do people mean what they say? Implications for subjective survey data. Am Econ Rev 91(2):67–72

    Article  Google Scholar 

  • Bilenko M (2004) Learnable similarity functions and their applications to clustering and record linkage. In: Proceedings of the Ninth AAAI/SIGART doctoral consortium, pp 981–982

  • Black SE, Devereux PJ, Salvanes KG (2005) Why the apple doesn’t fall far: understanding intergenerational transmission of human capital. Am Econ Rev 95(1):437–449

    Article  Google Scholar 

  • Boruch RF, Victor T, Cecil JS (2000) Resolving ethical and legal problems in randomized experiments. Crime Delinq 46(3):330–353

    Article  Google Scholar 

  • Bowers KJ, Johnson SD (2005) Domestic burglary repeats and space-time clusters: the dimensions of risk. Eur J Criminol 2(1):67–92

    Article  Google Scholar 

  • Braga AA, Apel R (2016) And we wonder why criminology is sometimes considered irrelevant in real-world policy conversations. Criminol Public Policy 15(3):813–829

    Article  Google Scholar 

  • Braga AA, Bond BJ (2008) Policing crime and disorder hot spots: a randomized controlled trial. Criminology 46(3):577–607

    Article  Google Scholar 

  • Braga AA, Weisburd DL, Waring EJ, Mazerolle LG, Spelman W, Gajewski F (1999) Problem-oriented policing in violent crime places: a randomized controlled experiment. Criminology 37(3):541–580

    Article  Google Scholar 

  • Branas CC, South E, Kondo MC, Hohl BC, Bourgois P, Wiebe DJ, MacDonald JM (2018) Citywide cluster randomized trial to restore blighted vacant land and its effects on violence, crime, and fear. Proc Nat Acad Sci 115(12):2946–2951

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  • Britt CL, Weisburd D (2010) Statistical power. In: Piquero AR, Weisburd D (eds) Handbook of quantitative criminology. Springer, pp 313–332

  • Broidy LM, Daday JK, Crandall CS, Sklar DP, Jost PF (2006) Exploring demographic, structural, and behavioral overlap among homicide offenders and victims. Homicide Stud 10(3):155–180

    Article  Google Scholar 

  • Camerer CF, Dreber A, Forsell E, Ho T-H, Huber J, Johannesson M, Kirchler M, Almenberg J, Altmejd A, Chan T, Heikensten E, Holzmeister F, Imai T, Isaksson S, Nave G, Pfeiffer T, Razen M, Wu H (2016) Evaluating replicability of laboratory experiments in economics. Science 351(6280):1433–1436

    Article  Google Scholar 

  • Campbell KM (2009) Impact of record-linkage methodology on performance indicators and multivariate relationships. J Subst Abuse Treat 36(1):110–117

    Article  Google Scholar 

  • Campbell KM, Deck D, Krupski A (2008) Record linkage software in the public domain: a comparison of Link Plus, the Link King, and a ’basic’ deterministic algorithm. Health Inform J 14(1):5–15

    Article  Google Scholar 

  • Cesarini D, Lindqvist E, Östling R, Wallace B (2016) Wealth, health, and child development: evidence from administrative data on swedish lottery players. Q J Econ 131(2):687–738

    Article  Google Scholar 

  • Chalfin A, Deza M (2017) The intergenerational effects of education on delinquency. J Econ Behav Org 159:553–571

    Article  Google Scholar 

  • Christen P (2012) Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection. Springer, New York

    Book  Google Scholar 

  • Christen P, Goiser K (2007) Quality and complexity measures for data linkage and deduplication. In: Guillet FJ, Hamilton HJ (eds) Quality measures in data mining. Springer, Berlin, pp 127–151

    Chapter  Google Scholar 

  • Clark DE, Hahn DR (1995) Comparison of probabilistic and deterministic record linkage in the development of a statewide trauma registry. Proc Annu Symp Comput Appl Med Care 1995:397–401

    Google Scholar 

  • Cohen J (1992) Statistical power analysis. Curr Dir Psychol Sci 1(3):98–101

    Article  Google Scholar 

  • Comfort M, Nurse AM, McKay T, Kramer K (2011) Taking children into account: addressing the intergenerational effects of parental incarceration. Criminol Publ Policy 10(3):839–850

    Article  Google Scholar 

  • Cook PJ, Kang S, Braga AA, Ludwig J, O’Brien ME (2015) An experimental evaluation of a comprehensive employment-oriented prisoner re-entry program. J Quant Criminol 31(3):355–382

    Article  Google Scholar 

  • Courtemanche CJ, Denteh A, Tchernis R (2018) Estimating the associations between snap and food insecurity, obesity, and food purchases with imperfect administrative measures of participation. Technical report, National Bureau of Economic Research

  • Curb JD, Ford CE, Pressel S, Palmer M, Babcock C, Hawkins CM (1985) Ascertainment of vital status through the national death index and the social security administration. Am J Epidemiol 121(5):754–766

    Article  Google Scholar 

  • Dahl GB, Kostøl AR, Mogstad M (2014) Family welfare cultures. Q J Econ 129(4):1711–1752

    Article  Google Scholar 

  • Davis RC, Taylor BG (1997) A proactive response to family violence: the results of a randomized experiment. Criminology 35(2):307–333

    Article  Google Scholar 

  • Deaton A (2010) Instruments, randomization, and learning about development. J Econ Lit 48(2):424–55

    Article  Google Scholar 

  • Dezember A, Stoltz M, Marmolejo L, Kanewske LC, Feingold KD, Wire S, Duhaime L, Maupin C (2020) The lack of experimental research in criminology–evidence from criminology and justice quarterly. J Exp Criminol

  • DiNardo JE, Pischke J-S (1997) The returns to computer use revisited: Have pencils changed the wage structure too? Q J Econ 112(1):291–303

    Article  Google Scholar 

  • Dobbie W, Goldin J, Yang CS (2018) The effects of pretrial detention on conviction, future crime, and employment: evidence from randomly assigned judges. Am Econ Rev 108(2):201–240

    Article  Google Scholar 

  • Dobbie W, Grönqvist H, Niknami S, Palme M, Priks M (2018) The intergenerational effects of parental incarceration. Technical report, National Bureau of Economic Research

  • Dobkin C, Finkelstein A, Kluender R, Notowidigdo MJ (2018) The economic consequences of hospital admissions. Am Econ Rev 108(2):308–352

    Article  Google Scholar 

  • Doleac JL, Temple C, Pritchard D, Roberts A (2020) Which prisoner reentry programs work? Replicating and extending analyses of three \(RCT\)s. Int Rev Law Econ 62:105902

    Article  Google Scholar 

  • Duflo E, Glennerster R, Kremer M (2007) Using randomization in development economics research: a toolkit. In: Schultz TP, Strauss JA (eds) Handbook of development economics, vol 4. North-Holland, Amsterdam, pp 3895–3962

    Google Scholar 

  • Dunn PM (1997) James lind (1716–94) of Edinburgh and the treatment of scurvy. Arch Dis Childhood-Fetal Neonatal Ed 76(1):F64–F65

    Article  Google Scholar 

  • Duwe G (2012) Evaluating the Minnesota comprehensive offender reentry plan (mcorp): results from a randomized experiment. Justice Q 29(3):347–383

    Article  Google Scholar 

  • Duwe G (2014) A randomized experiment of a prisoner reentry program: updated results from an evaluation of the Minnesota comprehensive offender reentry plan (MCORP). Crim Justice Stud 27(2):172–190

    Article  Google Scholar 

  • Enamorado T (2018) Active learning for probabilistic record linkage. Available at SSRN 3257638

  • Enamorado T, Fifield B, Imai K (2019) Using a probabilistic model to assist merging of large-scale administrative records. Am Polit Sci Rev 113(2):353–371

    Article  Google Scholar 

  • Farabee D, Zhang SX, Wright B (2014) An experimental evaluation of a nationally recognized employment-focused offender reentry program. J Exp Criminol 10(3):309–322

    Article  Google Scholar 

  • Farrington DP (1983) Randomized experiments on crime and justice. Crime Justice 4:257–308

    Article  Google Scholar 

  • Farrington DP (2003) A short history of randomized experiments in criminology: a meager feast. Eval Rev 27(3):218–227

    Article  Google Scholar 

  • Farrington DP (2006) Key longitudinal-experimental studies in criminology. J Exp Criminol 2(2):121–141

    Article  Google Scholar 

  • Farrington DP, Welsh BC (2006) A half century of randomized experiments on crime and justice. Crime Justice 34(1):55–132

    Article  Google Scholar 

  • Feigenbaum JJ (2016) Automated census record linking: a machine learning approach. Working paper

  • Fellegi IP, Sunter AB (1969) A theory for record linkage. J Am Stat Assoc 64(328):1183–1210

    Article  Google Scholar 

  • Ferrante A (1993) Developing an offender-based tracking system: the western australia inois project. Aust NZ J Criminol 26(3):232–250

    Article  Google Scholar 

  • Fischbacher U, Gächter S, Ernst F (2001) Are people conditionally cooperative? Evidence from a public goods experiment. Econ Lett 71(3):397–404

    Article  Google Scholar 

  • Fisher RA (1936) Design of experiments. Br Med J 1(3923):554–554

    Article  Google Scholar 

  • Freudenberg N, Wilets I, Greene MB, Richet BE (1998) Linking women in jail to community services: factors associated with rearrest and retention of drug-using women following release from jail. J Am Med Women’s Assoc 53(2):89–93

    Google Scholar 

  • Geerken MR (1994) Rap sheets in criminological research: considerations and caveats. J Quant Criminol 10(1):3–21

    Article  Google Scholar 

  • Gelber A, Isen A, Kessler JB (2016) The effects of youth employment: evidence from New York City lotteries. Q J Econ 131(1):423–460

    Article  Google Scholar 

  • Gerber AS, Malhotra N (2008) Do statistical reporting standards affect what is published? Publication bias in two leading political science journals. Q J Polit Sci 3(3):313–326

    Article  Google Scholar 

  • Gilbert DT, King G, Pettigrew S, Wilson TD (2016) Comment on “Estimating the reproducibility of psychological science”. Science 351(6277):1037

    Article  Google Scholar 

  • Gomatam S, Carter R, Ariet M, Mitchell G (2002) An empirical comparison of record linkage procedures. Stat Med 21(10):1485–1496

    Article  Google Scholar 

  • Gordon BR, Zettelmeyer F, Bhargava N, Chapsky D (2019) A comparison of approaches to advertising measurement: evidence from big field experiments at Facebook. Mark Sci 38(2):193–225

    Article  Google Scholar 

  • Gottfredson DC, Najaka SS, Kearley BW, Rocha CM (2006) Long-term effects of participation in the Baltimore City drug treatment court: results from an experimental study. J Exp Criminol 2(1):67–98

    Article  Google Scholar 

  • Hansen B (2015) Punishment and deterrence: evidence from drunk driving. Am Econ Rev 105(4):1581–1617

    Article  Google Scholar 

  • Hansen B, Waddell GR (2018) Legal access to alcohol and criminality. J Health Econ 57:277–289

    Article  Google Scholar 

  • Harron K, Dibben C, Boyd J, Hjern A, Azimaee M, Barreto ML, Goldstein H (2017) Challenges in administrative data linkage for research. Big Data Soc 4(2):1–12

    Article  Google Scholar 

  • Heckman JJ, Smith JA (1995) Assessing the case for social experiments. J Econ Perspect 9(2):85–110

    Article  Google Scholar 

  • Heller SB (2014) Summer jobs reduce violence among disadvantaged youth. Science 346(6214):1219–1223

    Article  Google Scholar 

  • Heller SB, Shah AK, Guryan J, Ludwig J, Mullainathan S, Pollack HA (2017) Thinking, fast and slow? Some field experiments to reduce crime and dropout in Chicago. Q J Econ 132(1):1–54

    Article  Google Scholar 

  • Hill SJ (2017) Changing votes or changing voters? How candidates and election context swing voters and mobilize the base. Electoral Stud 48:131–148

    Article  Google Scholar 

  • Hjalmarsson R, Lindquist MJ (2012) Like godfather, like son: exploring the intergenerational nature of crime. J Hum Resour 47(2):550–582

    Google Scholar 

  • Lyngstad TH, Skardhamar T (2011) Nordic register data and their untapped potential for criminological knowledge. Crime Justice 40(1):613–645

    Article  Google Scholar 

  • Hyatt JM, Andersen SN (2019) On the potential of incorporating administrative register data into randomized experiments. J Exp Criminol 15(3):469–497

    Article  Google Scholar 

  • Imbens GW (2010) Better LATE than nothing: some comments on Deaton (2009) and Heckman and Urzua (2009). J Econ Lit 48(2):399–423

    Article  Google Scholar 

  • Ioannidis JP, Stanley TD, Doucouliagos H (2017) The power of bias in economics research. Econ J 127(605):F236–F265

    Article  Google Scholar 

  • Jelveh Z, McNeill M (2018) Downstream impact of probabilistic matching quality on prediction performance. Manuscript in progress.

  • Johndrow J, Lum K, Dunson D (2018) Theoretical limits of microclustering for record linkage. Biometrika 105(2):431–446

    Article  Google Scholar 

  • Johnston A, Mas A (2018) Potential unemployment insurance duration and labor supply: the individual and market-level response to a benefit cut. J Polit Econ 126(6):2480–2522

    Article  Google Scholar 

  • Karr AF, Taylor MT, West SL, Setoguchi S, Kou TD, Gerhard T, Horton DB (2019) Comparing record linkage software programs and algorithms using real-world data. PLoS ONE 14(9):e0221459

    Article  Google Scholar 

  • Keizer K, Lindenberg S, Steg L (2008) The spreading of disorder. Science 322(5908):1681–1685

    Article  Google Scholar 

  • Khwaja AI, Mian A (2005) Do lenders favor politically connected firms? Rent provision in an emerging financial market. Q J Econ 120(4):1371–1411

    Article  Google Scholar 

  • Lahiri P, Larsen MD (2005) Regression analysis with linked data. J Am Stat Assoc 100(469):222–230

    Article  Google Scholar 

  • LaLonde RJ (1986) Evaluating the econometric evaluations of training programs with experimental data. Am Econ Rev 76(4):604–620

    Google Scholar 

  • Lane J (2018) Building an infrastructure to support the use of government administrative data for program performance and social science research. Ann Am Acad Polit Soc Sci 675(1):240–252

    Article  Google Scholar 

  • Lattimore PK, MacKenzie DL, Zajac G, Dawes D, Arsenault E, Tueller S (2016) Outcome findings from the hope demonstration field experiment: is swift, certain, and fair an effective supervision strategy? Criminol Public Policy 15(4):1103–1141

    Article  Google Scholar 

  • Laub JH, Sampson RJ (1988) Unraveling families and delinquency: a reanalysis of the Gluecks’data. Criminology 26(3):355–380

    Article  Google Scholar 

  • Lauritsen JL (1999) Limitations in the use of longitudinal self-report data: a comment. Criminology 37(3):687–694

    Article  Google Scholar 

  • Liberman AM, Kirk DS, Kim K (2014) Labeling effects of first juvenile arrests: secondary deviance and secondary sanctioning. Criminology 52(3):345–370

    Article  Google Scholar 

  • Loeffler CE (2013) Does imprisonment alter the life course? Evidence on crime and employment from a natural experiment. Criminology 51(1):137–166

    Article  Google Scholar 

  • Lum C, Mazerolle L (2014). History of randomized controlled experiments in criminal justice. Encycl Criminol Criminal Justice 2227–2239

  • Lynch J (2018) Not even our own facts: criminology in the era of big data. Criminology 56(3):437–454

    Article  Google Scholar 

  • MacDonald JM, Morral AR, Raymond B, Eibner C (2007) The efficacy of the Rio Hondo DUI court: a 2-year field experiment. Eval Rev 31(1):4–23

    Article  Google Scholar 

  • McCord J (2003) Cures that harm: unanticipated outcomes of crime prevention programs. Ann Am Acad Polit Soc Sci 587(1):16–30

    Article  Google Scholar 

  • Moher D, Dulberg CS, Wells GA (1994) Statistical power, sample size, and their reporting in randomized controlled trials. J Am Med Assoc 272(2):122–124

    Article  Google Scholar 

  • Moore CL, Amin J, Gidding HF, Law MG (2014) A new method for assessing how sensitivity and specificity of linkage studies affects estimation. PLoS ONE 9(7):e103690

    Article  Google Scholar 

  • Morris NA, Slocum LA (2010) The validity of self-reported prevalence, frequency, and timing of arrest: an evaluation of data collected using a life event calendar. J Res Crime Delinq 47(2):210–240

    Article  Google Scholar 

  • Mueller-Smith M (2015) The criminal and labor market impacts of incarceration. Working paper

  • Nagin DS, Sampson RJ (2019) The real gold standard: measuring counterfactual worlds that matter most to social science and policy. Annu Rev Criminol 2:123–145

    Article  Google Scholar 

  • Nelson MS, Wooditch A, Dario LM (2015) Sample size, effect size, and statistical power: a replication study of Weisburd’s paradox. J Exp Criminol 11(1):141–163

    Article  Google Scholar 

  • Neter J, Maynes ES, Ramanathan R (1965) The effect of mismatching on the measurement of response errors. J Am Stat Assoc 60(312):1005–1027

    Google Scholar 

  • Newcombe HB, Kennedy JM, Axford S, James AP (1959) Automatic linkage of vital records. Science 330(3381):954–959

    Article  Google Scholar 

  • Orchowsky S, Iwama J (2009) Improving state criminal history records: recidivism of sex offenders released in 2001. Report, Justice Research and Statistics Association

  • O’Brien DT, Sampson RJ (2015) Public and private spheres of neighborhood disorder: assessing pathways to violence using large-scale digital records. J Res Crime Delinq 52(4):486–510

    Article  Google Scholar 

  • Petrosino A, Turpin-Petrosino C, Buehler J (2003) Scared straight and other juvenile awareness programs for preventing juvenile delinquency: a systematic review of the randomized experimental evidence. Ann Am Acad Polit Soc Sci 589(1):41–62

    Article  Google Scholar 

  • Powell D, Seabury S (2018) Medical care spending and labor market outcomes: evidence from workers’ compensation reforms. Am Econ Rev 108(10):2995–3027

    Article  Google Scholar 

  • Powers E, Witmer H (1951) An experiment in the prevention of delinquency; the Cambridge-Somerville youth study. Columbia University Press, New York

    Google Scholar 

  • Price J, Buckles K, Van Leeuwen J, Riley I (2019, September). Combining family history and machine learning to link historical records. Working paper 26227, National Bureau of Economic Research

  • Pridemore WA, Makel MC, Plucker JA (2018) Replication in criminology and the social sciences. Annu Rev Criminol 1(1):19–38

    Article  Google Scholar 

  • Ridgeway G, Grogger J, Moyer RA, MacDonald JM (2019) Effect of gang injunctions on crime: a study of Los Angeles from 1988–2014. J Quant Criminol 35(3):517–541

    Article  Google Scholar 

  • Roberts J, Wells W (2010) The validity of criminal justice contacts reported by inmates: a comparison of self-reported data with official prison records. J Criminal Justice 38(5):1031–1037

    Article  Google Scholar 

  • Rossman SB, Roman JK, Zweig JM, Rempel M, Lindquist CH (2011) The multi-site adult drug court evaluation: executive summary. Urban Institute

  • Rothstein HR (2008) Publication bias as a threat to the validity of meta-analytic results. J Exp Criminol 4(1):61–81

    Article  Google Scholar 

  • Rubin DB (2008) Comment: the design and analysis of gold standard randomized experiments. J Am Stat Assoc 103(484):1350–1353

    Article  Google Scholar 

  • Sadosky P, Shrivastava A, Price M, Steorts RC (2015) Blocking methods applied to casualty records from the Syrian conflict. arXiv preprint arXiv:1510.07714

  • Sampson RJ (2010) Gold standard myths: observations on the experimental turn in quantitative criminology. J Quant Criminol 26(4):489–500

    Article  Google Scholar 

  • Sampson RJ, Laub JH (2003) Life-course desisters? Trajectories of crime among delinquent boys followed to age 70. Criminology 41(3):555–592

    Article  Google Scholar 

  • Sampson RJ, Winter AS (2018) Poisoned development: assessing childhood lead exposure as a cause of crime in a birth cohort followed through adolescence. Criminology 56(2):269–301

    Article  Google Scholar 

  • Sariyar M, Borg A, Pommerening K (2012) Active learning strategies for the deduplication of electronic patient data using classification trees. J Biomed Inform 45(5):893–900

    Article  Google Scholar 

  • Scheuren F, Winkler WE (1993) Regression analysis of data files that are computer matched, part i. Surv Methodol 19(1):39–58

    Google Scholar 

  • Scheuren F, Winkler WE (1997) Regression analysis of data files that are computer matched, part ii. Surv Methodol 23(2):157–165

    Google Scholar 

  • Sedlmeier P, Gigerenzer G (1989) Do studies of statistical power have an effect on the power of studies? Psychol Bull 105(2):309–316

    Article  Google Scholar 

  • Sherman LW (2007) The power few: experimental criminology and the reduction of harm. J Exp Criminol 3(4):299–321

    Article  Google Scholar 

  • Sherman W, Berk RA (1984) The Minneapolis domestic violence experiment. Police Foundation, Washington

  • Sherman LW, Weisburd D (1995) General deterrent effects of police patrol in crime “hot spots”: a randomized, controlled trial. Justice Q 12(4):625–648

    Article  Google Scholar 

  • Sherman LW, Schmidt JD, Rogan DP, Smith DA (1992) The variable effects of arrest on criminal careers: the Milwaukee domestic violence experiment. J Criminal Law Criminol 83(1):137–169

    Article  Google Scholar 

  • Sherman LW, Gottfredson DC, MacKenzie DL, Eck J, Reuter P, Bushway S (1997) Preventing crime: what works, what doesn’t, what’s promising: a report to the United States Congress. US Department of Justice, Office of Justice Programs, Washington, DC

  • Smith GJD, Bennett Moses L, Chan J (2017) The challenges of doing criminology in the big data era: towards a digital and data-driven approach. Br J Criminol 57(2):259–274

    Article  Google Scholar 

  • Smith JA, Todd PE (2001) Reconciling conflicting evidence on the performance of propensity-score matching methods. Am Econ Rev 91(2):112–118

    Article  Google Scholar 

  • Smith JA, Todd PE (2005) Does matching overcome lalonde’s critique of nonexperimental estimators? J Econom 125(1–2):305–353

    Article  Google Scholar 

  • Splawa-Neyman J, Dabrowska DM, Speed TP (1990[1923]). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Stat Sci 5(4):465–472

  • Stewart A, Dennison S, Allard T, Thompson C, Broidy L, Chrzanowski A (2015) Administrative data linkage as a tool for developmental and life-course criminology: The Queensland linkage project. Aust NZ J Criminol 48(3):409–428

    Article  Google Scholar 

  • Taxman FS, Caudy MS (2015) Risk tells us who, but not what or how. Criminol Public Policy 14(1):71–103

    Article  Google Scholar 

  • Tremblay RE, Vitaro F, Nagin D, Pagani L, Seguin JR (2003) The montreal longitudinal and experimental study. In: Taking stock of Delinquency. Springer, pp 205–254

  • Tromp M, Ravelli AC, Bonsel GJ, Hasman A, Reitsma JB (2011) Results from simulated data sets: Probabilistic record linkage outperforms deterministic record linkage. J Clin Epidemiol 64(5):565–572

    Article  Google Scholar 

  • Van Schellen M, Apel R, Nieuwbeerta P (2012) The impact of military service on criminal offending over the life course: evidence from a Dutch conviction cohort. J Exp Criminol 8(2):135–164

    Article  Google Scholar 

  • Vivalt E (2017) The trajectory of specification searching and publication bias across methods and disciplines. Working paper

  • Watson CI, Fiumara GP, Tabassi E, Salamon WJ, Flanagan PA (2014) Fingerprint vendor technology evaluation. Report, NIST

  • Weisburd D (2003) Ethical practice and evaluation of interventions in crime and justice: the moral imperative for randomized trials. Eval Rev 27(3):336–354

    Article  Google Scholar 

  • Weisburd D (2010) Justifying the use of non-experimental methods and disqualifying the use of randomized controlled trials: challenging folklore in evaluation research in crime and justice. J Exp Criminol 6(2):209–227

    Article  Google Scholar 

  • Weisburd D, Petrosino A, Mason G (1993) Design sensitivity in criminal justice experiments. Crime Justice 17:337–379

    Article  Google Scholar 

  • Wildeman C, Andersen SH (2017) Paternal incarceration and children’s risk of being charged by early adulthood: evidence from a Danish policy shock. Criminology 55(1):32–58

    Article  Google Scholar 

  • Winkler WE (2002) Methods for record linkage and Bayesian networks. U.S. Bureau of the Census, Statistical Research Division

  • Winkler WE (2006) Overview of record linkage and current research directions. U.S. Bureau of the Census

  • Yancey WE (2004) Improving EM algorithm estimates for record linkage parameters. U.S. Bureau of the Census

  • Zingmond DS, Ye Z, Ettner SL, Liu H (2004) Linking hospital discharge and death records-accuracy and sources of bias. J Clin Epidemiol 57(1):21–29

    Article  Google Scholar 

Download references

Acknowledgements

We extend our sincere thanks to Melissa McNeill at the University of Chicago Crime Lab for her work in developing the records matching algorithm employed in this paper. We would also like to thank Leslie Kellam, Ryang Hui Kim, Srivatsa Kothapally, Jens Ludwig, Jim Lynch, Mike Mueller-Smith, Aurelie Ouss, Greg Ridgeway, Jesse Rothstein and Greg Stoddard for helpful comments on this project. We thank Arnold Ventures for its generous support of the University of Chicago Crime Lab New York. Points of view or opinions contained within this document are those of the authors. They do not necessarily represent those of Arnold Ventures. Of course, all remaining errors are our own.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sarah Tahamont.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Computational Details

In this appendix we provide additional details for how statistical power can be computed under two possible states of the world: (1) in the absence of linking errors and (2) in the presence of linking errors. We use the derivations in this appendix to empirically demonstrate the effect of linking errors on statistical power in a hypothetical experiment in “Empirical Example” section of the paper.

For illustrative purposes, we will assume that a roster of individuals involved in a treatment program is being linked to arrest data to measure whether the program reduced the likelihood of arrest. Additionally, we will assume a record-linkage algorithm was run on the arrest data and that there existed a unique identifier allowing us to measure when predicted links between two records represented true and false matches and when predicted non-links represtented true and false non-matches.

We motivate the derivation by introducing a framework—a confusion matrix—that governs the incidence of linking errors in the arrest. Each row of the confusion matrix represents the incidence of an actual class (true non-match and true match) while each column represents the instances in a predicted class (predicted non-link and predicted link). The matrix thus allows us to understand the extent to which the algorithm is successful in classifying that two records belong to the same person.

In the following confusion matrix, \(y^*\) represents the true state of the world and y represents the observed state of the world after linking. The cells provide counts of the number of true negatives, false negatives, false positives and true positives, respectively in linking the data.

 

\(y^*\) = 0

\(y^*\) = 1

y = 0

TN

FN

y = 1

FP

TP

The diagonal entries of the matrix correspond to an alignment of the true and observed states of the world—observations for which \(y^*\) = y = 0 are true negatives and observations for which \(y^*\) = y = 1 are true positives. The off-diagonal entries provide us with the number of linking errors. In particular, the 2,1 element of the matrix provides the number of false positive links—this is the number of times in which an observation which is truly \(y^*\) = 0 is mistakenly linked to y = 1. Similarly, the 1,2 element of the matrix provides the number of false negative links where an observation that is truly \(y^*\) = 1 is mistakenly linked to a record for which y = 0.

The matrix allows us to compute four different rates capturing the success of a given linking strategy: the true positive and true negative rate and the false positive and false negative rate.

$$\begin{aligned} TPR&= \frac{TP}{TP+FN} \\ TNR&= \frac{TN}{TN+FP} \\ FPR&= 1-TNR \\ FNR&= 1-TPR \end{aligned}$$

The true positive rate (TPR) is defined as the number of linked positives divided by the number of true positives (TP+FN). Likewise the true negative rate (TNR) is the number of linked negatives divided by the number of true negatives (TN+FP). The corresponding false positive and false negative link rates are obtained by subtracting each of these quantities from 1. As we show in “Derivation of Estimated Treatment Effects, Standard Errors and Statistical Power” section of the paper, estimated treatment effects will be attenuated under linking errors and the attenuation will be proportional to 1-FPR-FNR. So long as FPR+FNR < 1, this will be strict attenuation towards zero but if FPR+FNR exceeds 1 then there can be a change in the sign of the bias.

To appreciate how this works, assume that the arrest dataset contained N = 10,000 records and that after running the matching algorithm, the following confusion matrix was generated:

 

\(y^*\) = 0

\(y^*\) = 1

y = 0

TN=3000

FN=1000

y = 1

FP=2000

TP=4000

The error rates for the matching algorithm can be computed as:

  • FNR = \(\frac{1000}{4000 + 1000}\) = 0.20

  • FPR = \(\frac{2000}{3000 + 2000}\) = 0.40

Assume that to test the effectiveness of the treatment program, 1500 individuals where randomized, with one-half in the treatment group (p = 0.5), the control group mean was \(\frac{1}{3}\), and that the treatment effect was \(\tau = -\frac{1}{15}\).

In the true state of the world, there are 250 individuals arrested in the control group and 200 in the treatment group, reflecting the fact that \(\tau = -\frac{1}{15}\). We next apply the error rates from the matching algorithm to the control and treatment groups respectively to generate the following confusion matrices:

Control group

 

\(y^*\) = 0

\(y^*\) = 1

y = 0

TN=300

FN=50

y = 1

FP=200

TP=200

Treatment group

 

\(y^*\) = 0

\(y^*\) = 1

y = 0

TN=330

FN=40

y = 1

FP=220

TP=160

Let \(y^*_{T=0}\) be the true number of individuals arrested in the control group and \(y_{T=0}\) be the observed number of individuals arrested in the control group. We see that \(y^*_{T=0}=50+200=250\) and \(y_{T=0}=200+200=400\).

Let \(y^*_{T=1}\) be the true number of individuals arrested in the treatment group and \(y_{T=1}\) be the observed number of individuals arrested in the treatment group. We see that \(y^*_{T=1}=40+160=200\) and \(y_{T=1}=220+160=380\).

The observed treatment effect can be computed as \(\bar{y}_{T=1} - \bar{y}_{T=0} = \frac{380}{750} - \frac{400}{750} = -\frac{2}{75}\), which is equivalent to \(\tau * (TPR - FPR) = -\frac{1}{15} * (0.8 - 0.4) = -\frac{2}{75}\)

In order to compute statistical power to detect a given potential treatment effect, we need to compute a standard error which is computed according to:

$$\begin{aligned} var(\hat{\tau }) = \frac{1}{p(1-p)} \frac{\sigma ^2}{N} \end{aligned}$$

The square root of this quantity is the standard error around the estimated treatment effect. N and p are simply the sample size and the proportion treated but we will need to compute \(\varsigma\) which is the mean square error from a regression of either \(y^*\) or y on D, depending on which state of the world we are in. We show how to compute \(\sigma ^2\) in absence and presence of linking errors in "Appendix 2".

We can then compute statistical power according to:

$$\begin{aligned} \beta = {{\Phi }} \left[ {-~{\Phi ^{-1}}} \left( \frac{\alpha }{2}\right) - \frac{\tau _h}{\sigma _{\tau _h}} \right] \end{aligned}$$

Carrying through the numerical example from our confusion table, power to detect a treatment effect of 20% in these data is 81% in the true state of the world and just 72% in the state of the world with linking errors. What would have been an adequately well-powered experiment is no longer well-powered in the presence of modest linking errors.

Appendix 2: Deriving Outcome Variance

In this section we show how to compute the residual sum of squares with a binary outcome and binary treatment in order to compute the \(\sigma ^2\). Let \(\bar{y}_C\) equal the control group mean and \(\tau\) the treatment effect:

$$\begin{aligned} \sum _i (y_i - \hat{y}_i)^2&= \sum _i (y_i - \bar{y}_C - \tau T_i)^2 \end{aligned}$$

We can decompose the above equation into four mutually exclusive groups determined by whether an individual is in the treatment or control group, and whether their associated outcome is \(y=0\) or \(y=1\).

$$\begin{aligned}&\sum _{i \in \{i | T_i=0, y_i=0\}} ( - \bar{y}_C)^2 + \sum _{i \in \{i | T_i=0, y_i=1\}} (1 - \bar{y}_C)^2 + \sum _{i \in \{i | T_i=1, y_i=0\}} (- \bar{y}_C - \tau )^2 + \sum _{i \in \{i | T_i=1, y_i=1\}} (1 - \bar{y}_C - \tau )^2\\&\quad = N_{C,0}~\bar{y}_C^2 + N_{C,1} + N_{C,1}~\bar{y}_C^2 - N_{C,1}~2\bar{y}_C + N_{T,0}\bar{y}_T^2 + N_{T,1} + N_{T,1}~\bar{y}_T^2 - N_{T,1}~2\bar{y}_T\\&\quad = n_C~\bar{y}_C^2 + N_{C,1} - N_{C,1}~2\bar{y}_C + n_T\bar{y}_T^2 + N_{T,1} - N_{T,1}~2\bar{y}_T\\&\quad = N_{C,1}~\bar{y}_C + N_{C,1} - N_{C,1}~2\bar{y}_C + N_{T,1}\bar{y}_T + N_{T,1} - N_{T,1}~2\bar{y}_T\\&\quad = N_{C,1} (\bar{y}_C + 1 - 2\bar{y}_C) + N_{T,1}(\bar{y}_T + 1 - 2\bar{y}_T)\\&\quad = N_{C,1} (1 - \bar{y}_C) + N_{T,1}(1 - \bar{y}_T) \end{aligned}$$

Appendix 3: Maximizing RSS

We now show why Eq. 10 is maximized when the control group mean, \(\bar{y}_C\), plus the treatment effect, \(\tau\), equal 0.5. Let \(N_{T,1}\) equal the number of individuals in the treatment group with \(y=1\) and \(N_{T,0}\) equal the number of individuals in the treatment group with \(y=0\). Note that \(N_{T,1} = (\bar{y}_C+ \tau ) N_T\) and \(N_{T,0} = N_T (1 - (\bar{y}_C+\tau ))\).

$$\begin{aligned} \sum _i (y_i - \hat{y}_i)^2&= \sum _i (y_i - (\bar{y}_C + \tau T_i))^2 \end{aligned}$$

For a given control group mean we will take derivatives with respect to \(\tau\), which means we will only consider individuals in the treatment group. We can decompose the previous equation into:

$$\begin{aligned} \sum _{i \in T} (y_i - \hat{y}_i)^2&= N_{T,0} (- \bar{y}_C - \tau )^2 + N_{T,1} (1 - \bar{y}_C - \tau )^2\\&= N_T (1 - (\bar{y}_C+\tau )) (- \bar{y}_C - \tau )^2 + N_T(\bar{y}_C+ \tau ) (1 - \bar{y}_C - \tau )^2\\&= N_T (1 - \bar{y}_C -\tau ) (\bar{y}_C + \tau )^2 + N_T(\bar{y}_C + \tau ) (1 - \bar{y}_C - \tau )^2\\&= N_T (1 - \bar{y}_C -\tau ) (\bar{y}_C+ \tau ) (\bar{y}_C + \tau + 1 - \alpha - \tau )\\&= N_T (1 - \bar{y}_C -\tau ) (\bar{y}_C + \tau ) \end{aligned}$$

Let \(\kappa = N_T (1 - \bar{y}_C -\tau ) (\bar{y}_C + \tau )\), then taking derivatives with respect to \(\tau\):

$$\begin{aligned} \frac{d\kappa }{d\tau }&= N_T (1 -2\bar{y}_C - 2\tau ) \end{aligned}$$

Setting the previous equation to zero and solving for \(\tau\) leads to

$$\begin{aligned} \bar{y}_C + \tau = 0.5 \end{aligned}$$

Appendix 4: Proof for Power Attenuation

In this section we show that even when the standard error estimated under linking error is smaller than the standard error estimated under no error, statistical power will still be larger for the latter scenario. Let \(\eta\) be True Positive Rate, \(\omega\) the False Positive Rate, \(tau^*\) the true treatment effect, \(\sigma _{\tau ^*}\) the true standard error, \(\hat{\tau }\) the observed treatment effect, and \(\sigma _{\hat{\tau }}\) the observed standard error. Note that \(0 \le \eta \le 1\) and \(0 \le \omega \le 1\). In order to show that

$$\begin{aligned} \frac{\tau ^*}{\sigma _{\tau ^*}} > \frac{\hat{\tau }}{\sigma _{\hat{\tau }}} \end{aligned}$$

we use the result from Eq. 7 to substitute for the observed treatment effect to get

$$\begin{aligned} \frac{\tau ^*}{\sigma _{\tau ^*}} > \frac{(\eta -\omega )\tau ^* }{\sigma _{\hat{\tau }}} \end{aligned}$$

and then rearrange terms to get the following:

$$\begin{aligned} \sigma _{\hat{\tau }} > (\eta -\omega )\sigma _{\tau _h} \end{aligned}$$

It is straightforward to show that this is equivalent to

$$\begin{aligned} \sqrt{\widehat{RSS}} > (\eta -\omega ) \sqrt{RSS^*} \end{aligned}$$

Or that

$$\begin{aligned} \widehat{RSS} - (\eta -\omega )^2 RSS^* > 0 \end{aligned}$$

where \(\widehat{RSS}\) is the residual sum of squares with linking error and \(RSS^*\) is the residual sum of squares without linking error.

In the following, \(N_{j,1}^{*}\) represents the number of observations for which the true value of y, \(y^{*}=1\) and \(N_{j,0}^{*}\) represents the number of observations for which the true value of y, \(y^{*}=0\). This allows us to write the last inequality above as

$$\begin{aligned} \sum _{j \in \{T,C\}} (\eta ~N_{j,1}^{*} + \omega ~N_{j,0}^{*}) \left( 1 - \frac{\eta ~N_{j,1}^{*} + \omega ~N_{j,0}^{*}}{N_j^{*}} \right) - (\eta -\omega )^2 \sum _{j \in \{T,C\}} N_{j,1}^{*} \left( 1 - \frac{N_{j,1}^{*}}{N_j^{*}}\right)&> 0 \implies \\ \sum _{j \in \{T,C\}} (\eta ~N_{j,1}^{*} + \omega ~N_{j,0}^{*}) \left( \frac{ (1-\eta ) N_{j,1}^{*}+ (1-\omega ) N_{j,0}^{*} }{N_{j,1}^{*}+N_{j,0}^{*}} \right) - (\eta -\omega )^2 \left( \frac{N_{j,1^{*}}~N_{j,0}^{*}}{N_{j,1}^{*}+N_{j,0}^{*}}\right)&> 0 \implies \\ \sum _{j \in \{T,C\}} \frac{\eta (1-\eta ) N_{j,1}^{*2}+ \omega (1-\omega ) N_{j,0}^{*2} + N_{j,1}^{*} N_{j,0}^{*} \left[ \eta (1-\omega ) + (1-\eta ) \omega - (\eta - \omega )^2 \right] }{N_{j,1}^{*}+N_{j,0}^{*}}&> 0 \implies \\ \sum _{j \in \{T,C\}} \frac{\eta (1-\eta ) N_{j,1}^{*2}+ \omega (1-\omega ) N_{j,0}^{*2} + N_{j,1}^{*} N_{j,0}^{*} \left[ \eta + \omega - \eta ^2 - \omega ^2\right] }{N_{j,1}^{*}+N_{j,0}^{*}}&> 0 \end{aligned}$$

All terms in the numerator of the last inequality are greater than zero, satisfying the condition.

Appendix 5: Treatment Heterogeneity Correlated With Matching Error

In this section we demonstrate how treatment effect heterogeneity that’s correlated with linking error can impact coefficient estimates. Consider a dichotomous covariate G which takes on two values M and F. We assume that linking error rates within group are equal across treatment and control but that \(TPR_{M} \ne TPR_{F}\) and \(FPR_{M} \ne FPR_{F}\). Further, assume that \(\tau _{M} \ne \tau _{F}\). We rewrite Eq. 4 as:

$$\begin{aligned} \hat{\tau } =&\sum _{j \in \{0,1\}} \sum _{g \in \{M,F\}} P(y_i = 1, y_i^{*}=j, G_i=g|T_i = 1) \\&-\, ~ P(y_i = 1, y_i^{*}=j, G_i=g|T_i = 0) \\ =&~ \sum _{j \in \{0,1\}} \sum _{g \in \{M,F\}} P(y_i = 1| y_i^{*}=j, G_i=g, T_i = 1) P(y_i^{*}=j| G_i=g, T_i = 1) P( G_i=g| T_i = 1) \\&-\, ~ P(y_i = 1| y_i^{*}=j, G_i=g, T_i = 0) P(y_i^{*}=j| G_i=g, T_i = 0) P( G_i=g| T_i = 0) \\ =&~ P(M) (TPR_{M} - FPR_{M}) \tau _M + P(F) (TPR_{F} - FPR_{F}) \tau _F \\ \end{aligned}$$

When both \(\tau _M\) and \(\tau _F\) are in the same direction, then linking error will only attenuate the pooled treatment effect in absolute value. However, if the signs of \(\tau _M\) and \(\tau _F\) are different, then the observed treatment effect may be greater than the true average treatment effect in absolute value.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tahamont, S., Jelveh, Z., Chalfin, A. et al. Dude, Where’s My Treatment Effect? Errors in Administrative Data Linking and the Destruction of Statistical Power in Randomized Experiments. J Quant Criminol 37, 715–749 (2021). https://doi.org/10.1007/s10940-020-09461-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10940-020-09461-x

Keywords

Navigation