当前位置: X-MOL 学术Comput. Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Privacy-preserving High-dimensional Data Publishing for Classification
Computers & Security ( IF 4.8 ) Pub Date : 2020-06-01 , DOI: 10.1016/j.cose.2020.101785
Rong Wang , Yan Zhu , Chin-Chen Chang , Qiang Peng

Abstract With increasing amounts of personal information being collected by various organizations, many privacy models have been proposed for masking the collected data so that the data can be published without releasing individual privacy. However, most existing privacy models are not applicable to high-dimensional data, because of the sparseness of high-dimensional search space. In this paper, we present our solution to release high-dimensional data for privacy preservation and classification analysis. The challenge facing us is how to reduce high dimensions from the perspective of privacy models while preserving as much information as possible for classification. Our proposed approach tackles it by using an idea of vertical partition, which is to vertically divide the raw data into different disjointed subsets of smaller dimensionality. Specifically, our partition metric considers both the correlation between attributes and the proportion of attributes in each subset. Then a generalization method based on local recoding is employed to each subset separately for achieving k-anonymity. Considering the hardness of the optimal implementation of k-anonymity, the local recoding method finds a near-optimal solution with the goal of improving efficiency. The proposed approach was evaluated using two datasets, and the experimental results showed that it outperformed two related approaches in data utility at the same privacy level.

中文翻译:

用于分类的隐私保护高维数据发布

摘要 随着各种组织收集的个人信息量不断增加,人们提出了许多隐私模型来掩盖收集到的数据,以便在不泄露个人隐私的情况下发布数据。然而,由于高维搜索空间的稀疏性,大多数现有的隐私模型不适用于高维数据。在本文中,我们提出了发布用于隐私保护和分类分析的高维数据的解决方案。我们面临的挑战是如何从隐私模型的角度减少高维,同时保留尽可能多的信息进行分类。我们提出的方法通过使用垂直分区的思想来解决它,即将原始数据垂直划分为不同的较小维度的不相交子集。具体来说,我们的分区度量既考虑了属性之间的相关性,也考虑了每个子集中属性的比例。然后对每个子集分别采用基于局部重新编码的泛化方法来实现k-匿名性。考虑到 k-匿名性最优实现的难度,本地重新编码方法以提高效率为目标寻找接近最优的解决方案。使用两个数据集对所提出的方法进行了评估,实验结果表明,它在相同隐私级别的数据效用方面优于两种相关方法。考虑到 k-匿名性最优实现的难度,本地重新编码方法以提高效率为目标寻找接近最优的解决方案。使用两个数据集对所提出的方法进行了评估,实验结果表明,它在相同隐私级别的数据效用方面优于两种相关方法。考虑到 k-匿名性最优实现的难度,本地重新编码方法以提高效率为目标寻找接近最优的解决方案。使用两个数据集对所提出的方法进行了评估,实验结果表明,它在相同隐私级别的数据效用方面优于两种相关方法。
更新日期:2020-06-01
down
wechat
bug