Staleness Analysis in Asynchronous Optimization,IEEE Transactions on Signal and Information Processing over Networks

当前位置： X-MOL 学术 › IEEE Trans. Signal Inf. Process. Over Netw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Staleness Analysis in Asynchronous Optimization
IEEE Transactions on Signal and Information Processing over Networks ( IF 3.2 ) Pub Date : 2022-04-06 , DOI: 10.1109/tsipn.2022.3163931
Haider Al-Lawati ₁ , Stark Draper ₁

Affiliation

Distributed optimization is widely used to solve large-scale optimization problems by parallelizing gradient-based algorithms across multiple computing nodes. In asynchronous optimization, the optimization parameter is updated using stale gradients, which are gradients computed with respect to outdated parameters. Although large degrees of staleness can slow convergence, little is known about the impact of staleness and its relation to other system parameters. In this work, we analyze asynchronous optimization when implemented using either hub-and-spoke or shared memory architectures. We show that the process of gradient arrival to the master node is similar in nature to a renewal process. We derive the bandwidth requirement of the system. For the hub-and-spoke setup, we derive bounds on the expected gradient staleness and show its connection to other system parameters such as the number of workers, expected compute time, and communication delays. Our derivations reveal that it is possible to adjust gradient staleness by tuning certain parameters such as minibatch size or the number of workers. For the shared memory architecture, we show that the expected staleness is equivalent to the number of workers. Our derivations can be used in existing convergence analyses to express convergence rates in terms of other known system parameters. Such an expression gives further details on what factors impact convergence.

中文翻译：

异步优化中的过时分析

分布式优化被广泛用于通过在多个计算节点上并行化基于梯度的算法来解决大规模优化问题。在异步优化中，优化参数使用过时的梯度进行更新，这些梯度是根据过时的参数计算的。尽管过时程度大会减慢收敛速度，但人们对过时的影响及其与其他系统参数的关系知之甚少。在这项工作中，我们分析了使用中心辐射型或共享内存架构实现时的异步优化。我们表明，梯度到达主节点的过程在本质上类似于更新过程。我们推导出系统的带宽需求。对于轴辐式设置，我们得出了预期梯度陈旧性的界限，并显示了它与其他系统参数的联系，例如工人数量、预期计算时间和通信延迟。我们的推导表明，可以通过调整某些参数（例如小批量大小或工作人员数量）来调整梯度陈旧性。对于共享内存架构，我们展示了预期的陈旧性等同于工人的数量。我们的推导可用于现有的收敛分析，以根据其他已知系统参数表示收敛速度。这样的表达式进一步详细说明了哪些因素会影响收敛。我们的推导表明，可以通过调整某些参数（例如小批量大小或工作人员数量）来调整梯度陈旧性。对于共享内存架构，我们展示了预期的陈旧性等同于工人的数量。我们的推导可用于现有的收敛分析，以根据其他已知系统参数表示收敛速度。这样的表达式进一步详细说明了哪些因素会影响收敛。我们的推导表明，可以通过调整某些参数（例如小批量大小或工作人员数量）来调整梯度陈旧性。对于共享内存架构，我们展示了预期的陈旧性等同于工人的数量。我们的推导可用于现有的收敛分析，以根据其他已知系统参数表示收敛速度。这样的表达式进一步详细说明了哪些因素会影响收敛。

更新日期：2022-04-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>