当前位置: X-MOL 学术IEEE Trans. Knowl. Data. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Composite Bloom Filters for Secure Record Linkage
IEEE Transactions on Knowledge and Data Engineering ( IF 8.9 ) Pub Date : 2014-12-01 , DOI: 10.1109/tkde.2013.91
Elizabeth Ashley Durham 1 , Murat Kantarcioglu 2 , Yuan Xue 3 , Csaba Toth 1 , Mehmet Kuzu 4 , Bradley Malin 5
Affiliation  

The process of record linkage seeks to integrate instances that correspond to the same entity. Record linkage has traditionally been performed through the comparison of identifying field values (e.g., Surname), however, when databases are maintained by disparate organizations, the disclosure of such information can breach the privacy of the corresponding individuals. Various private record linkage (PRL) methods have been developed to obscure such identifiers, but they vary widely in their ability to balance competing goals of accuracy, efficiency and security. The tokenization and hashing of field values into Bloom filters (BF) enables greater linkage accuracy and efficiency than other PRL methods, but the encodings may be compromised through frequency-based cryptanalysis. Our objective is to adapt a BF encoding technique to mitigate such attacks with minimal sacrifices in accuracy and efficiency. To accomplish these goals, we introduce a statistically-informed method to generate BF encodings that integrate bits from multiple fields, the frequencies of which are provably associated with a minimum number of fields. Our method enables a user-specified tradeoff between security and accuracy. We compare our encoding method with other techniques using a public dataset of voter registration records and demonstrate that the increases in security come with only minor losses to accuracy.

中文翻译:

用于安全记录链接的复合布隆过滤器

记录链接的过程寻求集成对应于同一实体的实例。记录链接传统上是通过比较识别字段值(例如,姓氏)来执行的,但是,当数据库由不同的组织维护时,此类信息的披露可能会侵犯相应个人的隐私。已经开发了各种私有记录链接 (PRL) 方法来隐藏此类标识符,但它们在平衡准确性、效率和安全性等竞争目标的能力方面差异很大。将字段值标记化和散列到 Bloom 过滤器 (BF) 中可以实现比其他 PRL 方法更高的链接准确性和效率,但编码可能会通过基于频率的密码分析而受到损害。我们的目标是采用 BF 编码技术来减轻此类攻击,同时最大限度地降低准确性和效率。为了实现这些目标,我们引入了一种基于统计信息的方法来生成 BF 编码,该编码集成来自多个字段的位,其频率可证明与最少数量的字段相关联。我们的方法可以在安全性和准确性之间实现用户指定的权衡。我们使用选民登记记录的公共数据集将我们的编码方法与其他技术进行了比较,并证明安全性的提高只会对准确性造成很小的损失。其频率可证明与最少数量的字段相关。我们的方法可以在安全性和准确性之间实现用户指定的权衡。我们使用选民登记记录的公共数据集将我们的编码方法与其他技术进行了比较,并证明安全性的提高只会对准确性造成很小的损失。其频率可证明与最少数量的字段相关。我们的方法可以在安全性和准确性之间实现用户指定的权衡。我们使用选民登记记录的公共数据集将我们的编码方法与其他技术进行了比较,并证明安全性的提高只会对准确性造成很小的损失。
更新日期:2014-12-01
down
wechat
bug