当前位置: X-MOL 学术IEEE Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Preserving User Privacy for Machine Learning: Local Differential Privacy or Federated Machine Learning?
IEEE Intelligent Systems ( IF 5.6 ) Pub Date : 2020-07-20 , DOI: 10.1109/mis.2020.3010335
Huadi Zheng 1 , Haibo Hu 1 , Ziyang Han 1
Affiliation  

The growing number of mobile and IoT devices has nourished many intelligent applications. In order to produce high-quality machine learning models, they constantly access and collect rich personal data such as photos, browsing history, and text messages. However, direct access to personal data has raised increasing public concerns about privacy risks and security breaches. To address these concerns, there are two emerging solutions to privacy-preserving machine learning, namely local differential privacy and federated machine learning. The former is a distributed data collection strategy where each client perturbs data locally before submitting to the server, whereas the latter is a distributed machine learning strategy to train models on mobile devices locally and merge their output (e.g., parameter updates of a model) through a control protocol. In this article, we conduct a comparative study on the efficiency and privacy of both solutions. Our results show that in a standard population and domain setting, both can achieve an optimal misclassification rate lower than 20% and federated machine learning generally performs better at the cost of higher client CPU usage. Nonetheless, local differential privacy can benefit more from a larger client population (>> 1k). As for privacy guarantee, local differential privacy also has flexible control over the data leakage.

中文翻译:


保护机器学习的用户隐私:本地差异隐私还是联合机器学习?



越来越多的移动和物联网设备滋养了许多智能应用。为了产生高质量的机器学习模型,他们不断访问和收集丰富的个人数据,例如照片、浏览历史记录和短信。然而,直接访问个人数据引起了公众对隐私风险和安全漏洞的越来越多的担忧。为了解决这些问题,有两种新兴的隐私保护机器学习解决方案,即本地差分隐私和联邦机器学习。前者是分布式数据收集策略,其中每个客户端在提交到服务器之前在本地扰动数据,而后者是分布式机器学习策略,在移动设备上本地训练模型并通过以下方式合并其输出(例如,模型的参数更新)控制协议。在本文中,我们对两种解决方案的效率和隐私性进行了比较研究。我们的结果表明,在标准群体和领域设置中,两者都可以实现低于 20% 的最佳错误分类率,并且联合机器学习通常以更高的客户端 CPU 使用率为代价表现得更好。尽管如此,本地差异隐私可以从更大的客户群体中受益更多(>> 1k)。在隐私保障方面,本地差分隐私对数据泄露也有灵活的控制。
更新日期:2020-07-20
down
wechat
bug