当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Heterogeneous data release for cluster analysis with differential privacy
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2020-05-20 , DOI: 10.1016/j.knosys.2020.106047
Rong Wang , Benjamin C.M. Fung , Yan Zhu

Many models have been proposed to preserve data privacy for different data publishing scenarios. Among these models, ϵ-differential privacy has drawn increasing attention in recent years due to its rigorous privacy guarantees. While many existing solutions using ϵ-differential privacy deal with relational data and set-valued data separately, most of the real-life data, such as electronic health records, are in heterogeneous form. Privacy protection on heterogeneous data has not been widely studied. Furthermore, many existing works in privacy protection consider preserving the utility for the tasks of frequent itemset mining or classification analysis, but few works have focused on data publication for cluster analysis. In this paper, we propose the first differentially-private solution to release heterogeneous data for cluster analysis. The challenge facing us is how to mask raw data without any explicit guidance. Our approach addresses this challenge by converting a clustering problem to a classification problem, in which class labels can be used to encode the cluster structure of the raw data and assist the masking process. The approach generalizes the raw data probabilistically and adds noise to them for satisfying ϵ-differential privacy. Through extensive experiments on real-life datasets, we validate the performance of our approach.



中文翻译:

异类数据发布,用于具有隐私差异的聚类分析

已经提出了许多模型来为不同的数据发布方案保留数据隐私。在这些模型中,ϵ-差异性隐私由于其严格的隐私保证,近年来引起了越来越多的关注。虽然许多现有解决方案使用ϵ-差异隐私分别处理关系数据和集合值数据,大多数现实生活数据(例如电子病历)都是异构形式。异构数据的隐私保护尚未得到广泛研究。此外,许多现有的隐私保护著作都考虑保留实用性以用于频繁项集挖掘或分类分析的任务,但是很少有著作着重于数据发布以进行聚类分析。在本文中,我们提出了第一个差异私有解决方案,以释放异构数据进行聚类分析。我们面临的挑战是如何在没有任何明确指导的情况下掩盖原始数据。我们的方法通过将聚类问题转化为分类问题来解决这一挑战,其中,类别标签可用于编码原始数据的簇结构并协助屏蔽过程。该方法概率性地概括了原始数据,并为它们增加了噪声,以满足ϵ-差异性隐私。通过对现实数据集的大量实验,我们验证了该方法的性能。

更新日期:2020-05-20
down
wechat
bug