P3GM: Private High-Dimensional Data Release via Privacy Preserving Phased Generative Model,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

P3GM: Private High-Dimensional Data Release via Privacy Preserving Phased Generative Model
arXiv - CS - Databases Pub Date : 2020-06-22 , DOI: arxiv-2006.12101
Shun Takagi, Tsubasa Takahashi, Yang Cao, Masatoshi Yoshikawa

How can we release a massive volume of sensitive data while mitigating privacy risks? Privacy-preserving data synthesis enables the data holder to outsource analytical tasks to an untrusted third party. The state-of-the-art approach for this problem is to build a generative model under differential privacy, which offers a rigorous privacy guarantee. However, the existing method cannot adequately handle high dimensional data. In particular, when the input dataset contains a large number of features, the existing techniques require injecting a prohibitive amount of noise to satisfy differential privacy, which results in the outsourced data analysis meaningless. To address the above issue, this paper proposes privacy-preserving phased generative model (P3GM), which is a differentially private generative model for releasing such sensitive data. P3GM employs the two-phase learning process to make it robust against the noise, and to increase learning efficiency (e.g., easy to converge). We give theoretical analyses about the learning complexity and privacy loss in P3GM. We further experimentally evaluate our proposed method and demonstrate that P3GM significantly outperforms existing solutions. Compared with the state-of-the-art methods, our generated samples look fewer noises and closer to the original data in terms of data diversity. Besides, in several data mining tasks with synthesized data, our model outperforms the competitors in terms of accuracy.

中文翻译：

P3GM：通过隐私保护分阶段生成模型发布私有高维数据

如何在降低隐私风险的同时发布海量敏感数据？隐私保护数据合成使数据持有者能够将分析任务外包给不受信任的第三方。这个问题的最新方法是在差异隐私下建立一个生成模型，它提供了严格的隐私保证。然而，现有方法不能充分处理高维数据。特别是当输入数据集包含大量特征时，现有技术需要注入大量噪声来满足差分隐私，这导致外包数据分析毫无意义。针对上述问题，本文提出了隐私保护分阶段生成模型（P3GM），这是一种用于发布此类敏感数据的差分隐私生成模型。P3GM 采用两阶段学习过程，使其对噪声具有鲁棒性，并提高学习效率（例如，易于收敛）。我们对 P3GM 中的学习复杂性和隐私损失进行了理论分析。我们进一步通过实验评估我们提出的方法，并证明 P3GM 显着优于现有解决方案。与最先进的方法相比，我们生成的样本在数据多样性方面看起来噪音更少，更接近原始数据。此外，在一些合成数据的数据挖掘任务中，我们的模型在准确性方面优于竞争对手。我们进一步通过实验评估我们提出的方法，并证明 P3GM 显着优于现有解决方案。与最先进的方法相比，我们生成的样本在数据多样性方面看起来噪音更少，更接近原始数据。此外，在一些合成数据的数据挖掘任务中，我们的模型在准确性方面优于竞争对手。我们进一步通过实验评估我们提出的方法，并证明 P3GM 显着优于现有解决方案。与最先进的方法相比，我们生成的样本在数据多样性方面看起来噪音更少，更接近原始数据。此外，在一些合成数据的数据挖掘任务中，我们的模型在准确性方面优于竞争对手。

更新日期：2020-11-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文