当前位置: X-MOL 学术Statistics and Public Policy › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ADGN: An Algorithm for Record Linkage Using Address, Date of Birth, Gender and Name
Statistics and Public Policy ( IF 1.5 ) Pub Date : 2017-01-01 , DOI: 10.1080/2330443x.2017.1389620
Stephen Ansolabehere 1 , Eitan D. Hersh 2
Affiliation  

ABSTRACT This article presents an algorithm for record linkage that uses multiple indicators derived from combinations of fields commonly found in databases. Specifically, the quadruplet of Address (A), Date of Birth (D), Gender (G), and Name (N) and any triplet of A-D-G-N (i.e., ADG, ADN, AGN, and DGN) also link records with an extremely high likelihood. Matching on multiple identifiers avoids problems of missing data, inconsistent fields, and typographical errors. We show, using a very large database from the State of Texas, that exact matches using combinations A, D, G, and N produce a rate of matches comparable to 9-Digit Social Security Number. Further examination of the linkage rates show that reporting of the data at a higher level of aggregation, such as Birth Year instead of Date of Birth and omission of names, makes correct matches between databases highly unlikely, protecting an individual’s records.

中文翻译:

ADGN:一种使用地址,出生日期,性别和姓名的记录链接算法

摘要本文介绍了一种记录链接算法,该算法使用从数据库中常见字段组合产生的多个指标。具体来说,地址(A),出生日期(D),性别(G)和名称(N)的四元组以及ADGN的任何三元组(即ADG,ADN,AGN和DGN)也将记录与可能性很高。匹配多个标识符可避免数据丢失,字段不一致和印刷错误的问题。我们显示,使用得克萨斯州的一个非常大的数据库,使用组合A,D,G和N进行的完全匹配产生的匹配率可与9位社会保险号相媲美。对链接率的进一步检查显示,报告数据的汇总级别较高,例如出生年份(而不是出生日期)和姓名的省略,
更新日期:2017-01-01
down
wechat
bug