Quantifying identifiability to choose and audit $ε$ in differentially private deep learning,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Quantifying identifiability to choose and audit $ε$ in differentially private deep learning
arXiv - CS - Machine Learning Pub Date : 2021-03-04 , DOI: arxiv-2103.02913
Daniel Bernau, Günther Eibl, Philip W. Grassal, Hannah Keller, Florian Kerschbaum

Differential privacy allows bounding the influence that training data records have on a machine learning model. To use differential privacy in machine learning, data scientists must choose privacy parameters $(\epsilon,\delta)$. Choosing meaningful privacy parameters is key since models trained with weak privacy parameters might result in excessive privacy leakage, while strong privacy parameters might overly degrade model utility. However, privacy parameter values are difficult to choose for two main reasons. First, the upper bound on privacy loss $(\epsilon,\delta)$ might be loose, depending on the chosen sensitivity and data distribution of practical datasets. Second, legal requirements and societal norms for anonymization often refer to individual identifiability, to which $(\epsilon,\delta)$ are only indirectly related. %Prior work has proposed membership inference adversaries to guide the choice of $(\epsilon,\delta)$. However, these adversaries are weaker than the adversary assumed by differential privacy and cannot empirically reach the upper bounds on privacy loss defined by $(\epsilon,\delta)$. Therefore, no quantification of a membership inference attack holds the exact meaning that $(\epsilon,\delta)$ does. We transform $(\epsilon,\delta)$ to a bound on the Bayesian posterior belief of the adversary assumed by differential privacy concerning the presence of any record in the training dataset. The bound holds for multidimensional queries under composition, and we show that it can be tight in practice. Furthermore, we derive an identifiability bound, which relates the adversary assumed in differential privacy to previous work on membership inference adversaries. We formulate an implementation of this differential privacy adversary that allows data scientists to audit model training and compute empirical identifiability scores and empirical $(\epsilon,\delta)$.

中文翻译：

量化在差异化私有深度学习中选择和审核$ε$的可识别性

差异性隐私允许限制训练数据记录对机器学习模型的影响。要在机器学习中使用差异隐私，数据科学家必须选择隐私参数$（\ epsilon，\ delta）$。选择有意义的隐私参数是关键，因为使用较弱的隐私参数训练的模型可能会导致过多的隐私泄漏，而较强的隐私参数可能会过度降低模型的实用性。但是，由于两个主要原因，很难选择隐私参数值。首先，隐私损失$（\ epsilon，\ delta）$的上限可能是宽松的，具体取决于所选的实用数据集的敏感性和数据分布。其次，匿名的法律要求和社会规范通常是指个人可识别性，而$（\ epsilon，\ delta）$仅间接相关。％先前的工作已提出成员推断对手，以指导$（\ epsilon，\ delta）$的选择。但是，这些对手比差分隐私所假定的对手要弱，并且不能凭经验达到$（\ epsilon，\ delta）$定义的隐私损失上限。因此，没有成员资格推断攻击的量化具有$（\ epsilon，\ delta）$的确切含义。我们将$（\ epsilon，\ delta）$变换为关于训练数据集中是否存在任何差异的差分隐私所假定的对手的贝叶斯后验信念的界限。在合成条件下，多维查询的界限成立，我们证明了它在实践中可能很严格。此外，我们得出了可识别性界限，从而将差异隐私中假定的对手与以前的成员推断对手的工作相关联。我们制定了这种差异化隐私对手的实现方式，使数据科学家可以审核模型训练并计算经验可识别性得分和经验$（\ epsilon，\ delta）$。

更新日期：2021-03-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文