当前位置: X-MOL 学术Journal of Quantitative Criminology › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dude, Where’s My Treatment Effect? Errors in Administrative Data Linking and the Destruction of Statistical Power in Randomized Experiments
Journal of Quantitative Criminology ( IF 2.8 ) Pub Date : 2020-06-09 , DOI: 10.1007/s10940-020-09461-x
Sarah Tahamont , Zubin Jelveh , Aaron Chalfin , Shi Yan , Benjamin Hansen

Objective

The increasing availability of large administrative datasets has led to an exciting innovation in criminal justice research—using administrative data to measure experimental outcomes in lieu of costly primary data collection. We demonstrate that this type of randomized experiment can have an unfortunate consequence: the destruction of statistical power. Combining experimental data with administrative records to track outcomes of interest typically requires linking datasets without a common identifier. In order to minimize mistaken linkages, researchers often use stringent linking rules like “exact matching” to ensure that speculative matches do not lead to errors in an analytic dataset. We show that this, seemingly conservative, approach leads to underpowered experiments, leaves real treatment effects undetected, and can therefore have profound implications for entire experimental literatures.

Methods

We derive an analytic result for the consequences of linking errors on statistical power and show how the problem varies across combinations of relevant inputs, including linking error rate, outcome density and sample size.

Results

Given that few experiments are overly well-powered, even small amounts of linking error can have considerable impact on Type II error rates. In contrast to exact matching, machine learning-based probabilistic matching algorithms allow researchers to recover a considerable share of the statistical power lost under stringent data-linking rules.

Conclusion

Our results demonstrate that probabilistic linking substantially outperforms stringent linking criteria. Failure to implement linking procedures designed to reduce linking errors can have dire consequences for subsequent analyses and, more broadly, for the viability of this type of experimental research.



中文翻译:

伙计,我的治疗效果呢?随机实验中行政数据链接的错误和统计效力的破坏

客观的

大型行政数据集的可用性不断增加,导致刑事司法研究出现了令人兴奋的创新——使用行政数据来衡量实验结果,而不是昂贵的原始数据收集。我们证明这种类型的随机实验可能会产生一个不幸的后果:统计能力的破坏。将实验数据与管理记录相结合以跟踪感兴趣的结果通常需要链接没有通用标识符的数据集。为了尽量减少错误的链接,研究人员经常使用严格的链接规则,如“精确匹配”,以确保推测匹配不会导致分析数据集中的错误。我们表明,这种看似保守的方法导致实验动力不足,使真正的治疗效果未被发现,

方法

我们得出了链接错误对统计功效的影响的分析结果,并展示了问题如何在相关输入的组合中发生变化,包括链接错误率、结果密度和样本大小。

结果

鉴于很少有实验具有过于强大的能力,即使是少量的链接错误也会对 II 类错误率产生相当大的影响。与精确匹配相比,基于机器学习的概率匹配算法允许研究人员恢复在严格的数据链接规则下失去的相当大一部分统计能力。

结论

我们的结果表明,概率链接大大优于严格的链接标准。未能实施旨在减少链接错误的链接程序可能会对后续分析产生可怕的后果,更广泛地说,对于此类实验研究的可行性。

更新日期:2020-06-09
down
wechat
bug