当前位置: X-MOL 学术arXiv.cs.NI › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reinforcement Learning for Datacenter Congestion Control
arXiv - CS - Networking and Internet Architecture Pub Date : 2021-02-18 , DOI: arxiv-2102.09337
Chen Tessler, Yuval Shpigelman, Gal Dalal, Amit Mandelbaum, Doron Haritan Kazakov, Benjamin Fuhrer, Gal Chechik, Shie Mannor

We approach the task of network congestion control in datacenters using Reinforcement Learning (RL). Successful congestion control algorithms can dramatically improve latency and overall network throughput. Until today, no such learning-based algorithms have shown practical potential in this domain. Evidently, the most popular recent deployments rely on rule-based heuristics that are tested on a predetermined set of benchmarks. Consequently, these heuristics do not generalize well to newly-seen scenarios. Contrarily, we devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks. We overcome challenges such as partial-observability, non-stationarity, and multi-objectiveness. We further propose a policy gradient algorithm that leverages the analytical structure of the reward function to approximate its derivative and improve stability. We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training. Our experiments, conducted on a realistic simulator that emulates communication networks' behavior, exhibit improved performance concurrently on the multiple considered metrics compared to the popular algorithms deployed today in real datacenters. Our algorithm is being productized to replace heuristics in some of the largest datacenters in the world.

中文翻译:

数据中心拥塞控制的强化学习

我们使用强化学习(RL)来处理数据中心中的网络拥塞控制任务。成功的拥塞控制算法可以显着改善延迟和整体网络吞吐量。直到今天,还没有这种基于学习的算法在该领域显示出实际的潜力。显然,最近最流行的部署依赖于基于规则的启发式技术,该启发式技术已在一组预定的基准测试中进行了测试。因此,这些启发式方法不能很好地推广到新出现的场景。相反,我们设计了一种基于RL的算法,旨在推广到现实世界中数据中心网络的不同配置。我们克服了部分可观察性,非平稳性和多目标性等挑战。我们进一步提出一种策略梯度算法,该算法利用奖励函数的解析结构来近似其导数并提高稳定性。我们表明,该方案优于其他流行的RL方法,并且可以推广到训练期间未看到的场景。与在实际数据中心中部署的流行算法相比,我们的实验是在逼真的仿真器上进行的,该仿真器可模拟通信网络的行为,并在多种考虑的指标上同时提高了性能。我们的算法正被产品化,以取代世界上一些最大的数据中心中的启发式算法。与现实的数据中心中当今流行的算法相比,在现实的仿真器上进行的仿真可以模拟通信网络的行为,并在多个考虑的指标上同时展现出改进的性能。我们的算法正被产品化,以取代世界上一些最大的数据中心中的启发式算法。与现实的数据中心中当今流行的算法相比,在现实的仿真器上进行的仿真可以模拟通信网络的行为,并在多个考虑的指标上同时展现出改进的性能。我们的算法正被产品化,以取代世界上一些最大的数据中心中的启发式算法。
更新日期:2021-02-19
down
wechat
bug