当前位置: X-MOL 学术Sci. China Inf. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Differential identifiability clustering algorithms for big data analysis
Science China Information Sciences ( IF 7.3 ) Pub Date : 2021-03-31 , DOI: 10.1007/s11432-020-2910-1
Tao Shang , Zheng Zhao , Xujie Ren , Jianwei Liu

Individual privacy preservation has become an important issue with the development of big data technology. The definition of ρ-differential identifiability (DI) precisely matches the legal definitions of privacy, which can provide an easy parameterization approach for practitioners so that they can set privacy parameters based on the privacy concept of individual identifiability. However, differential identifiability is currently only applied to some simple queries and achieved by Laplace mechanism, which cannot satisfy complex privacy preservation issues in big data analysis. In this paper, we propose a new exponential mechanism and composition properties of differential identifiability, and then apply differential identifiability to k-means and k-prototypes algorithms on MapReduce framework. DI k-means algorithm uses the usual Laplace mechanism and composition properties for numerical databases, while DI k-prototypes algorithm uses the new exponential mechanism and composition properties for mixed databases. The experimental results show that both DI k-means and DI k-prototypes algorithms satisfy differential identifiability.



中文翻译:

用于大数据分析的差异可识别性聚类算法

随着大数据技术的发展,个人隐私保护已成为一个重要的问题。定义ρ -示差辨识(DI)精确匹配隐私的法律定义,它可以提供对从业者一个简单的参数化的方法,使他们可以根据个人可识别的隐私概念设置隐私参数。但是,差异可识别性目前仅适用于一些简单查询,并通过拉普拉斯机制实现,无法满足大数据分析中复杂的隐私保护问题。在本文中,我们提出了一种新的指数可辨性机制和微分可识别性的组成性质,然后将微分可识别性应用于k均值和kMapReduce框架上的原型算法。DI k-均值算法对数值数据库使用常规的Laplace机制和合成属性,而DI k-原型算法对混合数据库使用新的指数机制和合成属性。实验结果表明,这两种DI ķ -means和DI ķ -prototypes算法满足微分识别性。

更新日期:2021-04-06
down
wechat
bug