Differentially Private Distance Learning in Categorical Data,Data Mining and Knowledge Discovery

当前位置： X-MOL 学术 › Data Min. Knowl. Discov. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Differentially Private Distance Learning in Categorical Data
Data Mining and Knowledge Discovery ( IF 4.8 ) Pub Date : 2021-07-13 , DOI: 10.1007/s10618-021-00778-0
Elena Battaglia ₁ , Simone Celano ₁ , Ruggero G. Pensa ₁

Affiliation

Most privacy-preserving machine learning methods are designed around continuous or numeric data, but categorical attributes are common in many application scenarios, including clinical and health records, census and survey data. Distance-based methods, in particular, have limited applicability to categorical data, since they do not capture the complexity of the relationships among different values of a categorical attribute. Although distance learning algorithms exist for categorical data, they may disclose private information about individual records if applied to a secret dataset. To address this problem, we introduce a differentially private family of algorithms for learning distances between any pair of values of a categorical attribute according to the way they are co-distributed with the values of other categorical attributes forming the so-called context. We define different variants of our algorithm and we show empirically that our approach consumes little privacy budget while providing accurate distances, making it suitable in distance-based applications, such as clustering and classification.

中文翻译：

分类数据中的差异化私人远程学习

大多数保护隐私的机器学习方法都是围绕连续或数字数据设计的，但分类属性在许多应用场景中很常见，包括临床和健康记录、人口普查和调查数据。特别是基于距离的方法对分类数据的适用性有限，因为它们不能捕捉分类属性不同值之间关系的复杂性。尽管存在用于分类数据的远程学习算法，但如果应用于秘密数据集，它们可能会披露有关个人记录的私人信息。为了解决这个问题，我们引入了一个差异私有的算法系列，用于根据类别属性的任何一对值与形成所谓上下文的其他类别属性的值的共同分布方式来学习它们之间的距离。我们定义了算法的不同变体，并根据经验表明，我们的方法在提供准确距离的同时消耗很少的隐私预算，使其适用于基于距离的应用程序，例如聚类和分类。

更新日期：2021-07-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>