IHP: improving the utility in differential private histogram publication,Distributed and Parallel Databases

当前位置： X-MOL 学术 › Distrib. Parallel. Databases › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

IHP: improving the utility in differential private histogram publication
Distributed and Parallel Databases ( IF 1.2 ) Pub Date : 2019-01-02 , DOI: 10.1007/s10619-018-07255-6
Hui Li , Jiangtao Cui , Xue Meng , Jianfeng Ma

Differential privacy (DP) is a promising tool for preserving privacy during data publication, as it provides strong theoretical privacy guarantees in face of adversaries with arbitrary background knowledge. Histogram, as the result of a set of count queries, serves as a core statistical tool to report data distributions and is in fact viewed as the fundamental method for many other statistical analysis such as range queries. It is an important form for data publishing. In this paper, we consider the scenario of publishing sensitive histogram data with differential privacy scheme. Existing work in this field has justified that, comparing to directly applying DP techniques (i.e., injecting noise) over the counts in histogram bins, grouping bins before noise injection is more effective (i.e., with higher utility) as it introduces much less error over the sanitized histogram given the same privacy budget. However, state-of-the-art works have not unveiled how the overall utility of a sanitized histogram can be affected by the balance between the privacy budget distributed between grouping and noise injection phases. In this work, we conduct a theoretical study towards how the probability of getting better groups can be improved such that the overall error introduced in sanitized histogram can be further reduced, which directly leads to a higher utility for the sanitized histograms. In particular, we show that the probability of achieving better grouping can be affected by two factors, namely privacy budget assigned in grouping and the normalized utility function used for selecting groups. Motivated by that, we propose a new DP histogram publishing scheme, namely Iterative Histogram Partition, in which we carefully assign privacy budget between grouping and injection phases based on our theoretical study. We also theoretically prove that $$\epsilon $$ϵ-differential privacy can be achieved according to our new scheme. Moreover, we also show that, under the same privacy budget, our scheme exhibits less errors in the sanitized histograms comparing with state-of-the-art methods. We also extends the model to multi-dimensional histogram publication cases. Finally, empirical study over four real-world datasets also justifies that our scheme achieves the least error among series of state-of-the-art baseline methods.

中文翻译：

IHP：提高差分私人直方图发布的效用

差分隐私（DP）是一种在数据发布过程中保护隐私的有前途的工具，因为它在面对具有任意背景知识的对手时提供了强有力的理论隐私保证。直方图作为一组计数查询的结果，作为报告数据分布的核心统计工具，实际上被视为许多其他统计分析（例如范围查询）的基本方法。它是数据发布的重要形式。在本文中，我们考虑使用差分隐私方案发布敏感直方图数据的场景。该领域的现有工作证明，与在直方图 bin 中的计数上直接应用 DP 技术（即注入噪声）相比，在噪声注入之前对 bin 进行分组更有效（即，具有更高的效用），因为在相同隐私预算的情况下，它在经过消毒的直方图上引入的误差要小得多。然而，最先进的工作并没有揭示经过消毒的直方图的整体效用如何受到分组和噪声注入阶段之间分布的隐私预算之间的平衡的影响。在这项工作中，我们对如何提高获得更好组的概率进行了理论研究，从而可以进一步减少在净化直方图中引入的整体误差，这直接导致净化直方图具有更高的效用。特别是，我们表明实现更好分组的概率会受到两个因素的影响，即分组中分配的隐私预算和用于选择组的归一化效用函数。以此为动力，我们提出了一种新的 DP 直方图发布方案，即迭代直方图分区，其中我们根据我们的理论研究在分组和注入阶段之间仔细分配隐私预算。我们还从理论上证明了根据我们的新方案可以实现 $$\epsilon $$ϵ-差异隐私。此外，我们还表明，在相同的隐私预算下，与最先进的方法相比，我们的方案在经过消毒的直方图中的错误更少。我们还将模型扩展到多维直方图出版案例。最后，对四个真实世界数据集的实证研究也证明我们的方案在一系列最先进的基线方法中实现了最小的误差。其中我们根据我们的理论研究在分组和注入阶段之间仔细分配隐私预算。我们还从理论上证明了根据我们的新方案可以实现 $$\epsilon $$ϵ-差异隐私。此外，我们还表明，在相同的隐私预算下，与最先进的方法相比，我们的方案在经过消毒的直方图中的错误更少。我们还将模型扩展到多维直方图出版案例。最后，对四个真实世界数据集的实证研究也证明我们的方案在一系列最先进的基线方法中实现了最小的误差。其中我们根据我们的理论研究在分组和注入阶段之间仔细分配隐私预算。我们还从理论上证明了根据我们的新方案可以实现 $$\epsilon $$ϵ-差异隐私。此外，我们还表明，在相同的隐私预算下，与最先进的方法相比，我们的方案在经过消毒的直方图中的错误更少。我们还将模型扩展到多维直方图发布案例。最后，对四个真实世界数据集的实证研究也证明我们的方案在一系列最先进的基线方法中实现了最小的误差。与最先进的方法相比，我们的方案在经过消毒的直方图中显示出更少的错误。我们还将模型扩展到多维直方图出版案例。最后，对四个真实世界数据集的实证研究也证明我们的方案在一系列最先进的基线方法中实现了最小的误差。与最先进的方法相比，我们的方案在经过消毒的直方图中显示出更少的错误。我们还将模型扩展到多维直方图出版案例。最后，对四个真实世界数据集的实证研究也证明我们的方案在一系列最先进的基线方法中实现了最小的误差。

更新日期：2019-01-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>