当前位置: X-MOL 学术IEEE Trans. Knowl. Data. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Selecting Optimal Subset to release under Differentially Private M-estimators from Hybrid Datasets
IEEE Transactions on Knowledge and Data Engineering ( IF 8.9 ) Pub Date : 2018-03-01 , DOI: 10.1109/tkde.2017.2773545
Meng Wang 1 , Zhanglong Ji 2 , Hyeon-Eui Kim 2 , Shuang Wang 2 , Li Xiong 3 , Xiaoqian Jiang 2
Affiliation  

Privacy concern in data sharing especially for health data gains particularly increasing attention nowadays. Now, some patients agree to open their information for research use, which gives rise to a new question of how to effectively use the public information to better understand the private dataset without breaching privacy. In this paper, we specialize this question as selecting an optimal subset of the public dataset for M-estimators in the framework of differential privacy (DP) in [1] . From a perspective of non-interactive learning, we first construct the weighted private density estimation from the hybrid datasets under DP. Along the same line as [2] , we analyze the accuracy of the DP M-estimators based on the hybrid datasets. Our main contributions are (i) we find that the bias-variance tradeoff in the performance of our M-estimators can be characterized in the sample size of the released dataset; (ii) based on this finding, we develop an algorithm to select the optimal subset of the public dataset to release under DP. Our simulation studies and application to the real datasets confirm our findings and set a guideline in the real application.

中文翻译:

从混合数据集中选择最优子集以在差异私有 M 估计器下发布

数据共享中的隐私问题,尤其是健康数据,如今受到越来越多的关注。现在,一些患者同意开放他们的信息用于研究用途,这就产生了一个新问题,即如何有效地利用公共信息来更好地理解私人数据集而不侵犯隐私。在本文中,我们将这个问题专门化为在 [1] 的差分隐私 (DP) 框架中为 M 估计器选择公共数据集的最佳子集。从非交互式学习的角度来看,我们首先从 DP 下的混合数据集构建加权私有密度估计。沿着与 [2] 相同的路线,我们分析了基于混合数据集的 DP M 估计器的准确性。我们的主要贡献是(i)我们发现我们的 M 估计器性能中的偏差 - 方差权衡可以在已发布数据集的样本大小中表征;(ii) 基于这一发现,我们开发了一种算法来选择公共数据集的最佳子集以在 DP 下发布。我们对真实数据集的模拟研究和应用证实了我们的发现,并为实际应用制定了指导方针。
更新日期:2018-03-01
down
wechat
bug