Differentially Private Distributed Online Learning,IEEE Transactions on Knowledge and Data Engineering

当前位置： X-MOL 学术 › IEEE Trans. Knowl. Data. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Differentially Private Distributed Online Learning
IEEE Transactions on Knowledge and Data Engineering ( IF 8.9 ) Pub Date : 2018-08-01 , DOI: 10.1109/tkde.2018.2794384
Chencheng Li ₁ , Pan Zhou ₁ , Li Xiong ₂ , Qian Wang ₃ , Ting Wang ₄

Affiliation

In the big data era, the generation of data presents some new characteristics, including wide distribution, high velocity, high dimensionality, and privacy concern. To address these challenges for big data analytics, we develop a privacy-preserving distributed online learning framework on the data collected from distributed data sources. Specifically, each node (i.e., data source) has the capacity of learning a model from its local dataset, and exchanges intermediate parameters with a random part of their own neighboring (logically connected) nodes. Hence, the topology of the communications in our distributed computing framework is unfixed in practice. As online learning always performs on the sensitive data, we introduce the notion of differential privacy (DP) into our distributed online learning algorithm (DOLA) to protect the data privacy during the learning, which prevents an adversary from inferring any significant sensitive information. Our model is of general value for big data analytics in the distributed setting, because it can provide rigorous and scalable privacy proof and have much less computational complexity when compared to classic schemes, e.g., secure multiparty computation (SMC). To tackle high-dimensional incoming data entries, we study a sparse version of the DOLA with novel DP techniques to save the computing resources and improve the utility. Furthermore, we present two modified private DOLAs to meet the need of practical applications. One is to convert the DOLA to distributed stochastic optimization in an offline setting, the other is to use the mini-batches approach to reduce the amount of the perturbation noise and improve the utility. We conduct experiments on real datasets in a configured distributed platform. Numerical experiment results validate the feasibility of our private DOLAs.

中文翻译：

差异化私有分布式在线学习

大数据时代，数据的产生呈现出分布广、速度快、维数高、隐私问题等新特点。为了应对大数据分析的这些挑战，我们针对从分布式数据源收集的数据开发了一个保护隐私的分布式在线学习框架。具体来说，每个节点（即数据源）都具有从其本地数据集学习模型的能力，并与它们自己的相邻（逻辑连接）节点的随机部分交换中间参数。因此，我们的分布式计算框架中的通信拓扑在实践中是不固定的。由于在线学习总是对敏感数据进行，我们将差分隐私 (DP) 的概念引入我们的分布式在线学习算法 (DOLA) 以保护学习过程中的数据隐私，从而防止对手推断出任何重要的敏感信息。我们的模型对于分布式环境中的大数据分析具有普遍价值，因为与经典方案（例如安全多方计算（SMC））相比，它可以提供严格且可扩展的隐私证明，并且计算复杂度要低得多。为了处理高维输入数据条目，我们使用新颖的 DP 技术研究了 DOLA 的稀疏版本，以节省计算资源并提高效用。此外，我们提出了两个修改后的私有 DOLA 以满足实际应用的需要。一种是在离线设置中将DOLA转换为分布式随机优化，另一种是使用小批量方法来减少扰动噪声的量并提高效用。我们在配置的分布式平台上对真实数据集进行实验。数值实验结果验证了我们私有 DOLA 的可行性。

更新日期：2018-08-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>