当前位置: X-MOL 学术arXiv.cs.CR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data
arXiv - CS - Cryptography and Security Pub Date : 2020-07-07 , DOI: arxiv-2007.03724
Alireza Sadeghi, Gang Wang, Meng Ma, Georgios B. Giannakis

Data used to train machine learning models can be adversarial--maliciously constructed by adversaries to fool the model. Challenge also arises by privacy, confidentiality, or due to legal constraints when data are geographically gathered and stored across multiple learners, some of which may hold even an "anonymized" or unreliable dataset. In this context, the distributionally robust optimization framework is considered for training a parametric model, both in centralized and federated learning settings. The objective is to endow the trained model with robustness against adversarially manipulated input data, or, distributional uncertainties, such as mismatches between training and testing data distributions, or among datasets stored at different workers. To this aim, the data distribution is assumed unknown, and lies within a Wasserstein ball centered around the empirical data distribution. This robust learning task entails an infinite-dimensional optimization problem, which is challenging. Leveraging a strong duality result, a surrogate is obtained, for which three stochastic primal-dual algorithms are developed: i) stochastic proximal gradient descent with an $\epsilon$-accurate oracle, which invokes an oracle to solve the convex sub-problems; ii) stochastic proximal gradient descent-ascent, which approximates the solution of the convex sub-problems via a single gradient ascent step; and, iii) a distributionally robust federated learning algorithm, which solves the sub-problems locally at different workers where data are stored. Compared to the empirical risk minimization and federated learning methods, the proposed algorithms offer robustness with little computation overhead. Numerical tests using image datasets showcase the merits of the proposed algorithms under several existing adversarial attacks and distributional uncertainties.

中文翻译:

在尊重隐私和稳健性的同时学习分布式不确定性和对抗性数据

用于训练机器学习模型的数据可能是对抗性的——由对手恶意构建以欺骗模型。当数据在地理上收集并存储在多个学习者之间时,隐私、机密性或由于法律限制也会带来挑战,其中一些甚至可能拥有“匿名”或不可靠的数据集。在这种情况下,分布式鲁棒优化框架被考虑用于在集中式和联合学习设置中训练参数模型。目标是赋予训练模型以对抗对抗性操纵的输入数据或分布不确定性的鲁棒性,例如训练和测试数据分布之间的不匹配,或存储在不同工作人员的数据集之间的不匹配。为此,假设数据分布未知,并位于以经验数据分布为中心的 Wasserstein 球内。这个强大的学习任务需要一个无限维的优化问题,这是具有挑战性的。利用强对偶结果,获得了一个代理,为此开发了三个随机原始对偶算法:i) 随机近端梯度下降,具有 $\epsilon$-accurate oracle,它调用 oracle 来解决凸子问题;ii) 随机近端梯度下降上升,它通过单个梯度上升步骤近似凸子问题的解决方案;以及,iii) 分布鲁棒的联邦学习算法,该算法在存储数据的不同工作人员本地解决子问题。与经验风险最小化和联邦学习方法相比,所提出的算法在计算开销很小的情况下提供了鲁棒性。使用图像数据集的数值测试展示了所提出算法在几种现有对抗性攻击和分布不确定性下的优点。
更新日期:2020-07-09
down
wechat
bug