PrivBayes,ACM Transactions on Database Systems

当前位置： X-MOL 学术 › ACM Trans. Database Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

PrivBayes
ACM Transactions on Database Systems ( IF 2.2 ) Pub Date : 2017-10-27 , DOI: 10.1145/3134428
Jun Zhang ₁ , Graham Cormode ₂ , Cecilia M. Procopiuc ₃ , Divesh Srivastava ₄ , Xiaokui Xiao ₅

Affiliation

Privacy-preserving data publishing is an important problem that has been the focus of extensive study. The state-of-the-art solution for this problem is differential privacy, which offers a strong degree of privacy protection without making restrictive assumptions about the adversary. Existing techniques using differential privacy, however, cannot effectively handle the publication of high-dimensional data. In particular, when the input dataset contains a large number of attributes, existing methods require injecting a prohibitive amount of noise compared to the signal in the data, which renders the published data next to useless. To address the deficiency of the existing methods, this paper presents P riv B ayes , a differentially private method for releasing high-dimensional data. Given a dataset D , P riv B ayes first constructs a Bayesian network N , which (i) provides a succinct model of the correlations among the attributes in D and (ii) allows us to approximate the distribution of data in D using a set P of low-dimensional marginals of D . After that, P riv B ayes injects noise into each marginal in P to ensure differential privacy and then uses the noisy marginals and the Bayesian network to construct an approximation of the data distribution in D . Finally, P riv B ayes samples tuples from the approximate distribution to construct a synthetic dataset, and then releases the synthetic data. Intuitively, P riv B ayes circumvents the curse of dimensionality, as it injects noise into the low-dimensional marginals in P instead of the high-dimensional dataset D . Private construction of Bayesian networks turns out to be significantly challenging, and we introduce a novel approach that uses a surrogate function for mutual information to build the model more accurately. We experimentally evaluate P riv B ayes on real data and demonstrate that it significantly outperforms existing solutions in terms of accuracy.

中文翻译：

私有贝叶斯

隐私保护数据发布是一个重要问题，一直是广泛研究的重点。这个问题的最先进的解决方案是差分隐私，它提供了强大程度的隐私保护，而不会对对手做出限制性假设。然而，使用差分隐私的现有技术不能有效地处理高维数据的发布。特别是，当输入数据集包含大量属性时，与数据中的信号相比，现有方法需要注入大量噪声，这使得已发布的数据几乎无用。针对现有方法的不足，本文提出 Priv乙是的，一种用于发布高维数据的差分私有方法。给定一个数据集D, 磷riv乙是的首先构建一个贝叶斯网络ñ，其中（i）提供了一个简洁的模型，其中的属性之间的相关性D(ii) 使我们能够近似地估计数据的分布D使用一套磷的低维边际D. 之后，Priv乙是的将噪声注入每个边缘磷确保差分隐私，然后使用噪声边缘和贝叶斯网络构建数据分布的近似值D. 最后，Priv乙是的从近似分布中采样元组以构建合成数据集，然后释放合成数据。直观地说，Priv乙是的规避了维度的诅咒，因为它将噪声注入到低维边缘磷而不是高维数据集D. 贝叶斯网络的私有构建变得非常具有挑战性，我们引入了一种新颖的方法，该方法使用互信息的代理函数来更准确地构建模型。我们通过实验评估 Priv乙是的并证明它在准确性方面明显优于现有解决方案。

更新日期：2017-10-27

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11