当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Answering Multi-Dimensional Range Queries under Local Differential Privacy
arXiv - CS - Databases Pub Date : 2020-09-14 , DOI: arxiv-2009.06538
Jianyu Yang, Tianhao Wang, Ninghui Li, Xiang Cheng, Sen Su

In this paper, we tackle the problem of answering multi-dimensional range queries under local differential privacy. There are three key technical challenges: capturing the correlations among attributes, avoiding the curse of dimensionality, and dealing with the large domains of attributes. None of the existing approaches satisfactorily deals with all three challenges. Overcoming these three challenges, we first propose an approach called Two-Dimensional Grids (TDG). Its main idea is to carefully use binning to partition the two-dimensional (2-D) domains of all attribute pairs into 2-D grids that can answer all 2-D range queries and then estimate the answer of a higher dimensional range query from the answers of the associated 2-D range queries. However, in order to reduce errors due to noises, coarse granularities are needed for each attribute in 2-D grids, losing fine-grained distribution information for individual attributes. To correct this deficiency, we further propose Hybrid-Dimensional Grids (HDG), which also introduces 1-D grids to capture finer-grained information on distribution of each individual attribute and combines information from 1-D and 2-D grids to answer range queries. To make HDG consistently effective, we provide a guideline for properly choosing granularities of grids based on an analysis of how different sources of errors are impacted by these choices. Extensive experiments conducted on real and synthetic datasets show that HDG can give a significant improvement over the existing approaches.

中文翻译:

局部差分隐私下的多维范围查询应答

在本文中,我们解决了在局部差分隐私下回答多维范围查询的问题。存在三个关键的技术挑战:捕获属性之间的相关性、避免维度灾难以及处理大的属性域。现有方法中没有一种能够令人满意地应对所有三个挑战。为了克服这三个挑战,我们首先提出了一种称为二维网格 (TDG) 的方法。它的主要思想是小心地使用分箱将所有属性对的二维(2-D)域划分为可以回答所有二维范围查询的二维网格,然后从相关二维范围查询的答案。但是,为了减少噪声引起的误差,二维网格中的每个属性都需要粗粒度,从而丢失了单个属性的细粒度分布信息。为了纠正这个缺陷,我们进一步提出了混合维网格 (HDG),它也引入了 1-D 网格来捕获关于每个单独属性分布的细粒度信息,并结合来自 1-D 和 2-D 网格的信息来回答范围查询。为了使 HDG 始终有效,我们根据对这些选择如何影响不同错误源的分析,提供了正确选择网格粒度的指南。在真实和合成数据集上进行的大量实验表明,HDG 可以对现有方法进行显着改进。我们进一步提出了混合维网格 (HDG),它也引入了 1-D 网格来捕获有关每个单独属性分布的细粒度信息,并结合来自 1-D 和 2-D 网格的信息来回答范围查询。为了使 HDG 始终有效,我们根据对这些选择如何影响不同错误源的分析,提供了正确选择网格粒度的指南。在真实和合成数据集上进行的大量实验表明,HDG 可以对现有方法进行显着改进。我们进一步提出了混合维网格 (HDG),它也引入了 1-D 网格来捕获有关每个单独属性分布的细粒度信息,并结合来自 1-D 和 2-D 网格的信息来回答范围查询。为了使 HDG 始终有效,我们根据对这些选择如何影响不同错误源的分析,提供了正确选择网格粒度的指南。在真实和合成数据集上进行的大量实验表明,HDG 可以对现有方法进行显着改进。我们基于对这些选择如何影响不同错误源的分析,提供了正确选择网格粒度的指南。在真实和合成数据集上进行的大量实验表明,HDG 可以对现有方法进行显着改进。我们基于对这些选择如何影响不同错误源的分析,提供了正确选择网格粒度的指南。在真实和合成数据集上进行的大量实验表明,HDG 可以对现有方法进行显着改进。
更新日期:2020-09-15
down
wechat
bug