当前位置: X-MOL 学术Journal of Economic Literature › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
How Well Do Automated Linking Methods Perform? Lessons from US Historical Data
Journal of Economic Literature ( IF 12.6 ) Pub Date : 2020-12-09 , DOI: 10.1257/jel.20191526
Martha Bailey 1, 2 , Connor Cole 1 , Morgan Henderson 1 , Catherine Massey 1
Affiliation  

This paper reviews the literature in historical record linkage in the United States and examines the performance of widely used record-linking algorithms and common variations in their assumptions. We use two high-quality, hand-linked data sets and one synthetic ground truth to examine the direct effects of linking algorithms on data quality. We find that (i) no algorithm (including hand linking) consistently produces representative samples; (ii) 15 to 37 percent of links chosen by widely used algorithms are classified as errors by trained human reviewers; and (iii) false links are systematically related to baseline sample characteristics, showing that some algorithms may introduce systematic measurement error into analyses. A case study shows that the combined effects of (i)–(iii) attenuate estimates of the intergenerational income elasticity by up to 29 percent, and common variations in algorithm assumptions result in greater attenuation. As current practice moves to automate linking and increase link rates, these results highlight the important potential consequences of linking errors on inferences with linked data. We conclude with constructive suggestions for reducing linking errors and directions for future research. (JEL C45, C81, J62, N31, N32)

中文翻译:

自动链接方法的性能如何?美国历史数据的教训

本文回顾了美国历史记录链接的文献,并检查了广泛使用的记录链接算法的性能及其假设的常见变化。我们使用两个高质量的手动链接数据集和一个合成基础事实来检查链接算法对数据质量的直接影响。我们发现 (i) 没有算法(包括手动链接)始终如一地产生具有代表性的样本;(ii) 广泛使用的算法选择的链接中有 15% 到 37% 被训练有素的人工审阅者归类为错误;(iii) 错误链接与基线样本特征系统相关,表明某些算法可能会将系统测量误差引入分析。一个案例研究表明,(i)-(iii) 的综合影响使代际收入弹性的估计值衰减高达 29%,算法假设的常见变化导致​​更大的衰减。随着当前实践转向自动化链接和提高链接率,这些结果突出了将错误与链接数据链接到推理的重要潜在后果。最后,我们提出了减少链接错误和未来研究方向的建设性建议。(JEL C45、C81、J62、N31、N32) 最后,我们提出了减少链接错误和未来研究方向的建设性建议。(JEL C45、C81、J62、N31、N32) 最后,我们提出了减少链接错误和未来研究方向的建设性建议。(JEL C45、C81、J62、N31、N32)
更新日期:2020-12-09
down
wechat
bug