当前位置: X-MOL 学术IEEE Trans. Inform. Forensics Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Cost of Privacy in Asynchronous Differentially-Private Machine Learning
IEEE Transactions on Information Forensics and Security ( IF 6.3 ) Pub Date : 1-13-2021 , DOI: 10.1109/tifs.2021.3050603
Farhad Farokhi , Nan Wu , David Smith , Mohamed Ali Kaafar

We consider training machine learning models using data located on multiple private and geographically-scattered servers with different privacy settings. Due to the distributed nature of the data, communicating with all collaborating private data owners simultaneously may prove challenging or altogether impossible. We consider differentially-private asynchronous algorithms for collaboratively training machine-learning models on multiple private datasets. The asynchronous nature of the algorithms implies that a central learner interacts with the private data owners one-on-one whenever they are available for communication without needing to aggregate query responses to construct gradients of the entire fitness function. Therefore, the algorithm efficiently scales to many data owners. We define the cost of privacy as the difference between the fitness of a privacy-preserving machine-learning model and the fitness of trained machine-learning model in the absence of privacy concerns. We demonstrate that the cost of privacy has an upper bound that is inversely proportional to the combined size of the training datasets squared and the sum of the privacy budgets squared. We validate the theoretical results with experiments on financial and medical datasets. The experiments illustrate that collaboration among more than 10 data owners with at least 10,000 records with privacy budgets greater than or equal to 1 results in a superior machine-learning model in comparison to a model trained in isolation on only one of the datasets, illustrating the value of collaboration and the cost of the privacy. The number of the collaborating datasets can be lowered if the privacy budget is higher.

中文翻译:


异步差分隐私机器学习中的隐私成本



我们考虑使用位于具有不同隐私设置的多个私有且地理上分散的服务器上的数据来训练机器学习模型。由于数据的分布式特性,同时与所有合作的私有数据所有者进行通信可能会具有挑战性或完全不可能。我们考虑使用差分私有异步算法在多个私有数据集上协作训练机器学习模型。算法的异步性质意味着,只要私有数据所有者可以进行通信,中央学习器就会与私有数据所有者进行一对一的交互,而无需聚合查询响应来构建整个适应度函数的梯度。因此,该算法可以有效地扩展到许多数据所有者。我们将隐私成本定义为隐私保护机器学习模型的适用性与在不存在隐私问题的情况下经过训练的机器学习模型的适用性之间的差异。我们证明,隐私成本有一个上限,该上限与训练数据集平方和隐私预算平方总和的总大小成反比。我们通过金融和医疗数据集的实验验证了理论结果。实验表明,与仅在一个数据集上单独训练的模型相比,拥有至少 10,000 条记录且隐私预算大于或等于 1 的 10 多个数据所有者之间的协作会产生更出色的机器学习模型,这说明了协作的价值和隐私的成本。如果隐私预算较高,则可以减少协作数据集的数量。
更新日期:2024-08-22
down
wechat
bug