Distributed optimization for degenerate loss functions arising from over-parameterization,Artificial Intelligence

当前位置： X-MOL 学术 › Artif. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Distributed optimization for degenerate loss functions arising from over-parameterization
Artificial Intelligence ( IF 14.4 ) Pub Date : 2021-08-16 , DOI: 10.1016/j.artint.2021.103575
Chi Zhang ₁ , Qianxiao Li _{1,

2}

Affiliation

We consider distributed optimization with degenerate loss functions, where the optimal sets of local loss functions have a non-empty intersection. This regime often arises in optimizing large-scale multi-agent AI systems (e.g., deep learning systems), where the number of trainable weights far exceeds the number of training samples, leading to highly degenerate loss surfaces. Under appropriate conditions, we prove that distributed gradient descent in this case converges even when communication is arbitrarily less frequent, which is not the case for non-degenerate loss functions. Moreover, we quantitatively analyze the convergence rate, as well as the communication and computation trade-off, providing insights into designing efficient distributed optimization algorithms. Our theoretical findings are confirmed by both distributed convex optimization and deep learning experiments.

中文翻译：

过度参数化引起的退化损失函数的分布式优化

我们考虑具有退化损失函数的分布式优化，其中局部损失函数的最佳集具有非空交集。这种机制经常出现在优化大规模多智能体 AI 系统（例如深度学习系统）时，其中可训练权重的数量远远超过训练样本的数量，导致高度退化的损失面。在适当的条件下，我们证明了这种情况下的分布式梯度下降即使在通信频率任意降低的情况下也能收敛，而非退化损失函数则不然。此外，我们定量分析了收敛速度，以及通信和计算的权衡，为设计高效的分布式优化算法提供了见解。

更新日期：2021-08-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>