当前位置:
X-MOL 学术
›
arXiv.cs.LG
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Quantifying identifiability to choose and audit $ε$ in differentially private deep learning
arXiv - CS - Machine Learning Pub Date : 2021-03-04 , DOI: arxiv-2103.02913 Daniel Bernau, Günther Eibl, Philip W. Grassal, Hannah Keller, Florian Kerschbaum
arXiv - CS - Machine Learning Pub Date : 2021-03-04 , DOI: arxiv-2103.02913 Daniel Bernau, Günther Eibl, Philip W. Grassal, Hannah Keller, Florian Kerschbaum
Differential privacy allows bounding the influence that training data records
have on a machine learning model. To use differential privacy in machine
learning, data scientists must choose privacy parameters $(\epsilon,\delta)$.
Choosing meaningful privacy parameters is key since models trained with weak
privacy parameters might result in excessive privacy leakage, while strong
privacy parameters might overly degrade model utility. However, privacy
parameter values are difficult to choose for two main reasons. First, the upper
bound on privacy loss $(\epsilon,\delta)$ might be loose, depending on the
chosen sensitivity and data distribution of practical datasets. Second, legal
requirements and societal norms for anonymization often refer to individual
identifiability, to which $(\epsilon,\delta)$ are only indirectly related. %Prior work has proposed membership inference adversaries to guide the choice
of $(\epsilon,\delta)$. However, these adversaries are weaker than the
adversary assumed by differential privacy and cannot empirically reach the
upper bounds on privacy loss defined by $(\epsilon,\delta)$. Therefore, no
quantification of a membership inference attack holds the exact meaning that
$(\epsilon,\delta)$ does. We transform $(\epsilon,\delta)$ to a bound on the
Bayesian posterior belief of the adversary assumed by differential privacy
concerning the presence of any record in the training dataset. The bound holds
for multidimensional queries under composition, and we show that it can be
tight in practice. Furthermore, we derive an identifiability bound, which
relates the adversary assumed in differential privacy to previous work on
membership inference adversaries. We formulate an implementation of this
differential privacy adversary that allows data scientists to audit model
training and compute empirical identifiability scores and empirical
$(\epsilon,\delta)$.
中文翻译:
量化在差异化私有深度学习中选择和审核$ε$的可识别性
差异性隐私允许限制训练数据记录对机器学习模型的影响。要在机器学习中使用差异隐私,数据科学家必须选择隐私参数$(\ epsilon,\ delta)$。选择有意义的隐私参数是关键,因为使用较弱的隐私参数训练的模型可能会导致过多的隐私泄漏,而较强的隐私参数可能会过度降低模型的实用性。但是,由于两个主要原因,很难选择隐私参数值。首先,隐私损失$(\ epsilon,\ delta)$的上限可能是宽松的,具体取决于所选的实用数据集的敏感性和数据分布。其次,匿名的法律要求和社会规范通常是指个人可识别性,而$(\ epsilon,\ delta)$仅间接相关。%先前的工作已提出成员推断对手,以指导$(\ epsilon,\ delta)$的选择。但是,这些对手比差分隐私所假定的对手要弱,并且不能凭经验达到$(\ epsilon,\ delta)$定义的隐私损失上限。因此,没有成员资格推断攻击的量化具有$(\ epsilon,\ delta)$的确切含义。我们将$(\ epsilon,\ delta)$变换为关于训练数据集中是否存在任何差异的差分隐私所假定的对手的贝叶斯后验信念的界限。在合成条件下,多维查询的界限成立,我们证明了它在实践中可能很严格。此外,我们得出了可识别性界限,从而将差异隐私中假定的对手与以前的成员推断对手的工作相关联。我们制定了这种差异化隐私对手的实现方式,使数据科学家可以审核模型训练并计算经验可识别性得分和经验$(\ epsilon,\ delta)$。
更新日期:2021-03-05
中文翻译:
量化在差异化私有深度学习中选择和审核$ε$的可识别性
差异性隐私允许限制训练数据记录对机器学习模型的影响。要在机器学习中使用差异隐私,数据科学家必须选择隐私参数$(\ epsilon,\ delta)$。选择有意义的隐私参数是关键,因为使用较弱的隐私参数训练的模型可能会导致过多的隐私泄漏,而较强的隐私参数可能会过度降低模型的实用性。但是,由于两个主要原因,很难选择隐私参数值。首先,隐私损失$(\ epsilon,\ delta)$的上限可能是宽松的,具体取决于所选的实用数据集的敏感性和数据分布。其次,匿名的法律要求和社会规范通常是指个人可识别性,而$(\ epsilon,\ delta)$仅间接相关。%先前的工作已提出成员推断对手,以指导$(\ epsilon,\ delta)$的选择。但是,这些对手比差分隐私所假定的对手要弱,并且不能凭经验达到$(\ epsilon,\ delta)$定义的隐私损失上限。因此,没有成员资格推断攻击的量化具有$(\ epsilon,\ delta)$的确切含义。我们将$(\ epsilon,\ delta)$变换为关于训练数据集中是否存在任何差异的差分隐私所假定的对手的贝叶斯后验信念的界限。在合成条件下,多维查询的界限成立,我们证明了它在实践中可能很严格。此外,我们得出了可识别性界限,从而将差异隐私中假定的对手与以前的成员推断对手的工作相关联。我们制定了这种差异化隐私对手的实现方式,使数据科学家可以审核模型训练并计算经验可识别性得分和经验$(\ epsilon,\ delta)$。