当前位置: X-MOL 学术J. Comb. Optim. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Differentially private approximate aggregation based on feature selection
Journal of Combinatorial Optimization ( IF 1 ) Pub Date : 2021-01-02 , DOI: 10.1007/s10878-020-00666-1
Zaobo He , Akshita Maradapu Vera Venkata Sai , Yan Huang , Daehee seo , Hanzhou Zhang , Qilong Han

Privacy-preserving data aggregation is an important problem that has attracted extensive study. The state-of-the-art techniques for solving this problem is differential privacy, which offers a strong privacy guarantee without making strong assumptions about the attacker. However, existing solutions cannot effectively query data aggregation from high-dimensional datasets under differential privacy guarantee. Particularly, when the input dataset contains large number of dimensions, existing solutions must inject large scale of noise into returned aggregates. To address the above issue, this paper proposes an algorithm for querying differentially private approximate aggregates from high-dimensional datasets. Given a dataset D, our algorithm first develops a \(\varepsilon '\)-differentially private feature selection method that is based on a data sampling process over a kd-tree, which allows us to obtain a differentially private low-dimensional dataset with representative instances. After that, our algorithm samples independent samples from the kd-tree aiming at obtaining \((\alpha ', \delta ')\)-approximate aggregates. Finally, a model is proposed to determine the relevance between privacy and utility budgets such that the final aggregate still satisfies the accuracy requirements specified by data consumers. Intuitively, the proposed algorithm circumvents the dilemma of both dimensionality and the height threshold of kd-tree, as it samples a low-dimensional dataset S and queries aggregates from S, instead of the kd-tree. Satisfying user-specified privacy and utility budgets after multiple-stages approximation is significantly challenging, and we presents a novel model to determine the parameters’ relevance.



中文翻译:

基于特征选择的差分私有近似聚合

保持隐私的数据聚合是一个重要的问题,已经引起了广泛的研究。解决此问题的最新技术是差异隐私,它提供了强大的隐私保证,而无需对攻击者做出强有力的假设。然而,现有的解决方案不能在差分隐私保证下有效地从高维数据集中查询数据聚合。特别是,当输入数据集包含大量维时,现有解决方案必须将大量噪声注入返回的聚合中。针对上述问题,本文提出了一种从高维数据集中查询差分私有近似集合的算法。给定数据集D,我们的算法首先建立一个\(\ varepsilon'\)-差分私有特征选择方法,该方法基于kd树上的数据采样过程,这使我们可以获得具有代表性实例的差分私有低维数据集。之后,我们的算法从kd树中采样独立样本,旨在获得\((\ alpha',\ delta')\) -近似集合。最后,提出了一个模型来确定隐私和公用事业预算之间的相关性,以使最终总量仍满足数据使用者指定的准确性要求。直观地,该算法在采样低维数据集S并从S查询集合时,规避了kd-tree的维数和高度阈值的难题。,而不是kd-tree。在多阶段逼近后,要满足用户指定的隐私和公用事业预算非常具有挑战性,我们提出了一种新颖的模型来确定参数的相关性。

更新日期:2021-01-02
down
wechat
bug