当前位置: X-MOL 学术Int. J. Pattern Recognit. Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Field Weights Computation for Probabilistic Record Linkage in Presence of Missing Data
International Journal of Pattern Recognition and Artificial Intelligence ( IF 1.5 ) Pub Date : 2020-02-21 , DOI: 10.1142/s0218001420590466
Yinghao Zhang 1 , Senlin Xu 1 , Mingfan Zheng 1 , Xinran Li 1
Affiliation  

Record linkage is the task for identifying which records refer to the same entity. When records in different data sources do not have a common key and they contain typographical errors in their identifier fields, the extended Fellegi–Sunter probabilistic record linkage method with consideration of field similarity proposed by Winkler, is one of the most effective methods to perform record linkage to our knowledge. But this method has a limitation that it cannot efficiently handle the problem of missing value in the fields, an inappropriate weight is assigned to record pair containing missing data. Therefore, to improve the performance of Winkler’s probabilistic record linkage method in presence of missing value, we proposed a solution for adjusting record pair’s weight when missing data occurred, which allows enhancing the accuracy of the Winkler’s record linkage decisions without increasing much more computational time.

中文翻译:

存在缺失数据时概率记录关联的字段权重计算

记录链接是识别哪些记录引用同一实体的任务。当不同数据源中的记录没有共同的键并且它们的标识符字段包含印刷错误时,Winkler提出的考虑字段相似性的扩展Fellegi-Sunter概率记录链接方法是执行记录的最有效方法之一与我们的知识联系起来。但是这种方法有一个局限性,它不能有效地处理字段中的缺失值问题,给包含缺失数据的记录对分配了不合适的权重。因此,为了提高 Winkler 概率记录链接方法在存在缺失值时的性能,我们提出了一种在发生缺失数据时调整记录对权重的解决方案,
更新日期:2020-02-21
down
wechat
bug