On the privacy-utility trade-off in differentially private hierarchical text classification,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On the privacy-utility trade-off in differentially private hierarchical text classification
arXiv - CS - Computation and Language Pub Date : 2021-03-04 , DOI: arxiv-2103.02895
Dominik Wunderlich, Daniel Bernau, Francesco Aldà, Javier Parra-Arnau, Thorsten Strufe

Hierarchical models for text classification can leak sensitive or confidential training data information to adversaries due to training data memorization. Using differential privacy during model training can mitigate leakage attacks against trained models by perturbing the training optimizer. However, for hierarchical text classification a multiplicity of model architectures is available and it is unclear whether some architectures yield a better trade-off between remaining model accuracy and model leakage under differentially private training perturbation than others. We use a white-box membership inference attack to assess the information leakage of three widely used neural network architectures for hierarchical text classification under differential privacy. We show that relatively weak differential privacy guarantees already suffice to completely mitigate the membership inference attack, thus resulting only in a moderate decrease in utility. More specifically, for large datasets with long texts we observed transformer-based models to achieve an overall favorable privacy-utility trade-off, while for smaller datasets with shorter texts CNNs are preferable.

中文翻译：

论差分私有分层文本分类中的隐私-实用性权衡

由于训练数据记忆，用于文本分类的分层模型可能会将敏感或机密的训练数据信息泄漏给对手。在模型训练过程中使用差异隐私可以通过扰动训练优化器来减轻针对训练模型的泄漏攻击。但是，对于分层文本分类，可以使用多种模型体系结构，并且尚不清楚某些体系结构是否在差分私人训练扰动下在其余模型精度和模型泄漏之间产生了比其他体系结构更好的折衷。我们使用白盒成员身份推断攻击来评估三种广泛使用的神经网络体系结构在差分隐私下用于分层文本分类的信息泄漏。我们表明，相对较弱的差异隐私保证已经足以完全缓解成员资格推理攻击，因此仅导致实用性的适度下降。更具体地说，对于具有长文本的大型数据集，我们观察到基于变压器的模型可以实现总体上有利的隐私实用性权衡，而对于文本较短的较小数据集，CNN则更可取。

更新日期：2021-03-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>