当前位置: X-MOL 学术ACM Trans. Database Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PrivBayes
ACM Transactions on Database Systems ( IF 2.2 ) Pub Date : 2017-10-27 , DOI: 10.1145/3134428
Jun Zhang 1 , Graham Cormode 2 , Cecilia M. Procopiuc 3 , Divesh Srivastava 4 , Xiaokui Xiao 5
Affiliation  

Privacy-preserving data publishing is an important problem that has been the focus of extensive study. The state-of-the-art solution for this problem is differential privacy, which offers a strong degree of privacy protection without making restrictive assumptions about the adversary. Existing techniques using differential privacy, however, cannot effectively handle the publication of high-dimensional data. In particular, when the input dataset contains a large number of attributes, existing methods require injecting a prohibitive amount of noise compared to the signal in the data, which renders the published data next to useless. To address the deficiency of the existing methods, this paper presents P riv B ayes , a differentially private method for releasing high-dimensional data. Given a dataset D , P riv B ayes first constructs a Bayesian network N , which (i) provides a succinct model of the correlations among the attributes in D and (ii) allows us to approximate the distribution of data in D using a set P of low-dimensional marginals of D . After that, P riv B ayes injects noise into each marginal in P to ensure differential privacy and then uses the noisy marginals and the Bayesian network to construct an approximation of the data distribution in D . Finally, P riv B ayes samples tuples from the approximate distribution to construct a synthetic dataset, and then releases the synthetic data. Intuitively, P riv B ayes circumvents the curse of dimensionality, as it injects noise into the low-dimensional marginals in P instead of the high-dimensional dataset D . Private construction of Bayesian networks turns out to be significantly challenging, and we introduce a novel approach that uses a surrogate function for mutual information to build the model more accurately. We experimentally evaluate P riv B ayes on real data and demonstrate that it significantly outperforms existing solutions in terms of accuracy.

中文翻译:

私有贝叶斯

隐私保护数据发布是一个重要问题,一直是广泛研究的重点。这个问题的最先进的解决方案是差分隐私,它提供了强大程度的隐私保护,而不会对对手做出限制性假设。然而,使用差分隐私的现有技术不能有效地处理高维数据的发布。特别是,当输入数据集包含大量属性时,与数据中的信号相比,现有方法需要注入大量噪声,这使得已发布的数据几乎无用。针对现有方法的不足,本文提出 Priv是的,一种用于发布高维数据的差分私有方法。给定一个数据集D, 磷riv是的首先构建一个贝叶斯网络ñ,其中(i)提供了一个简洁的模型,其中的属性之间的相关性D(ii) 使我们能够近似地估计数据的分布D使用一套的低维边际D. 之后,Priv是的将噪声注入每个边缘确保差分隐私,然后使用噪声边缘和贝叶斯网络构建数据分布的近似值D. 最后,Priv是的从近似分布中采样元组以构建合成数据集,然后释放合成数据。直观地说,Priv是的规避了维度的诅咒,因为它将噪声注入到低维边缘而不是高维数据集D. 贝叶斯网络的私有构建变得非常具有挑战性,我们引入了一种新颖的方法,该方法使用互信息的代理函数来更准确地构建模型。我们通过实验评估 Priv是的并证明它在准确性方面明显优于现有解决方案。
更新日期:2017-10-27
down
wechat
bug