Tensor-Based Reinforcement Learning for Network Routing,IEEE Journal of Selected Topics in Signal Processing

当前位置： X-MOL 学术 › IEEE J. Sel. Top. Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Tensor-Based Reinforcement Learning for Network Routing
IEEE Journal of Selected Topics in Signal Processing ( IF 8.7 ) Pub Date : 2021-02-02 , DOI: 10.1109/jstsp.2021.3055957
Kai-Chu Tsai, Zirui Zhuang, Ricardo Lent, Jingyu Wang, Qi Qi, Li-Chun Wang, Zhu Han

In the recent years, we have witnessed an explosion of networking applications due to the reasons such as the rapid development of cloud infrastructure, edge computing, and the Internet of Things. Furthermore, those applications become complex, the problem related to the large size of the state space and limited metric collection has emerged. This leads to an urging demand for adaptive management method in network routing. However, the complexity of traditional routing algorithms can be prohibited for practical systems. To overcome this challenge, we propose a novel tensor-based reinforcement learning method to route and schedule the packet flows, which is adaptive and model-free. Moreover, we improve the learning quality and efficiency by combining the Tucker decomposition technique within the learning process so that the machine learning direction can be obtained with low complexity. Finally, simulation results show that our proposed algorithm can achieve better performance under the same training episode and more stable results with less convergence time than conventional routing method, K-shortest path, traditional reinforcement learning approaches (i.e. Q-learning and SARSA) and comparable results to DQL.

中文翻译：

基于张量的网络路由强化学习

近年来，由于云基础设施、边缘计算、物联网等快速发展，我们目睹了网络应用的爆炸式增长。此外，这些应用变得复杂，出现了与状态空间较大和度量收集有限相关的问题。这导致了对网络路由中自适应管理方法的迫切需求。然而，传统路由算法的复杂性对于实际系统来说是无法实现的。为了克服这一挑战，我们提出了一种新颖的基于张量的强化学习方法来路由和调度数据包流，该方法是自适应且无模型的。此外，我们通过在学习过程中结合塔克分解技术来提高学习质量和效率，从而能够以较低的复杂度获得机器学习方向。最后，仿真结果表明，与传统路由方法、K-最短路径、传统强化学习方法（即 Q-learning 和 SARSA）相比，我们提出的算法可以在相同的训练集下获得更好的性能，并且以更少的收敛时间获得更稳定的结果，并且具有可比性结果到 DQL。

更新日期：2021-02-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11