Privacy preserving classification over differentially private data,WIREs Data Mining and Knowledge Discovery

当前位置： X-MOL 学术 › WIREs Data Mining Knowl. Discov. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Privacy preserving classification over differentially private data
WIREs Data Mining and Knowledge Discovery ( IF 6.4 ) Pub Date : 2020-12-13 , DOI: 10.1002/widm.1399
Ezgi Zorarpacı ₁ , Selma Ayşe Özel ₂

Affiliation

Privacy preserving data classification is an important research area in data mining field. The goal of a privacy preserving classification algorithm is to protect the sensitive information as much as possible, while providing satisfactory classification accuracy. Differential privacy is a strong privacy guarantee that enables privacy of sensitive data stored in a database by determining the ratio of sensitive information leakage with respect to an ɛ parameter. In this study, our aim is to investigate the classification performance of the state‐of‐the‐art classification algorithms such as C4.5, Naïve Bayes, One Rule, Bayesian Networks, PART, Ripper, K*, IBk, and Random tree for performing privacy preserving classification. To preserve privacy of the data to be classified, we applied input perturbation technique coming from differential privacy, and observed the relationship between the ɛ parameter values and accuracy of the classifiers. To our best knowledge, this article is the first study that analyzes the performances of the well‐known classification algorithms over differentially private data, and discovers which datasets are more suitable for privacy preserving classification when input perturbation is applied to provide data privacy. The classification algorithms are compared by using the differentially private versions of the well‐known datasets from the UCI repository. According to the experimental results, we observed that, as ɛ parameter value increases, better classification accuracies are achieved with lower privacy levels. When the classifiers are compared, Naïve Bayes classifier is the most successful method. The ɛ parameter should be greater than or equal to 2 (i.e., ɛ ≥2) to achieve cloud server is malicious and untrusted, sensitive data will satisfactory classification accuracies.

中文翻译：

差异隐私数据的隐私保护分类

隐私保护数据分类是数据挖掘领域的重要研究领域。隐私保护分类算法的目标是在提供令人满意的分类准确性的同时，尽可能地保护敏感信息。差异隐私是一种强大的隐私保证，它可以通过确定敏感信息泄漏相对于ɛ的比率来实现对存储在数据库中的敏感数据的隐私保护。范围。在这项研究中，我们的目的是研究最新分类算法的分类性能，例如C4.5，朴素贝叶斯，一个规则，贝叶斯网络，PART，Ripper，K *，IBk和随机树用于执行隐私保护分类。保留数据的隐私进行分类，我们采用从差动隐私来输入扰动技术，和观察到的关系ɛ参数值和分类器的准确性。据我们所知，本文是第一项研究，分析了针对差分私有数据的著名分类算法的性能，并发现了在应用输入扰动提供数据隐私时，哪些数据集更适合保留隐私的分类。通过使用UCI存储库中知名数据集的差异私有版本来比较分类算法。根据实验结果，我们观察到，随着ɛ参数值的增加，以较低的隐私级别可以实现更好的分类准确性。比较分类器时，朴素贝叶斯分类器是最成功的方法。该ɛ参数应大于或等于2（即，ɛ ≥2）来实现云服务器是恶意的和不受信任的，敏感的数据会令人满意的分类精确度。

更新日期：2020-12-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文