RL-Routing: An SDN Routing Algorithm Based on Deep Reinforcement Learning,IEEE Transactions on Network Science and Engineering

当前位置： X-MOL 学术 › IEEE Trans. Netw. Sci. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

RL-Routing: An SDN Routing Algorithm Based on Deep Reinforcement Learning
IEEE Transactions on Network Science and Engineering ( IF 6.6 ) Pub Date : 2020-08-19 , DOI: 10.1109/tnse.2020.3017751
Yi-Ren Chen , Amir Rezapour , Wen-Guey Tzeng , Shi-Chun Tsai

Communication networks are difficult to model and predict because they have become very sophisticated and dynamic. We develop a reinforcement learning routing algorithm (RL-Routing) to solve a traffic engineering (TE) problem of SDN in terms of throughput and delay. RL-Routing solves the TE problem via experience, instead of building an accurate mathematical model. We consider comprehensive network information for state representation and use one-to-many network configuration for routing choices. Our reward function, which uses network throughput and delay, is adjustable for optimizing either upward or downward network throughput. After appropriate training, the agent learns a policy that predicts future behavior of the underlying network and suggests better routing paths between switches. The simulation results show that RL-Routing obtains higher rewards and enables a host to transfer a large file faster than Open Shortest Path First (OSPF) and Least Loaded (LL) routing algorithms on various network topologies. For example, on the NSFNet topology, the sum of rewards obtained by RL-Routing is 119.30, whereas those of OSPF and LL are 106.59 and 74.76, respectively. The average transmission time for a 40GB file using RL-Routing is

$\text{25.2}~s$

. Those of OSPF and LL are

$\text{63}~s$

and

$\text{53.4}~s$

, respectively.

中文翻译：

RL路由：基于深度强化学习的SDN路由算法

通信网络很难建模和预测，因为它们已经变得非常复杂和动态。我们开发了强化学习路由算法（RL-Routing），以解决吞吐量和延迟方面的SDN流量工程（TE）问题。RL路由通过经验解决了TE问题，而不是建立准确的数学模型。我们考虑使用全面的网络信息来表示状态，并使用一对多的网络配置来进行路由选择。我们的奖励功能使用网络吞吐量和延迟，可以调整以优化向上或向下的网络吞吐量。经过适当的培训后，代理将学习一种策略，该策略可以预测基础网络的未来行为，并建议交换机之间的路由路径更好。仿真结果表明，RL-Routing可以获得更高的回报，并使主机能够在各种网络拓扑上比开放式最短路径优先（OSPF）和最不负载（LL）路由算法更快地传输大型文件。例如，在NSFNet拓扑上，通过RL路由获得的奖励总和为119.30，而通过OSPF和LL获得的奖励分别为106.59和74.76。使用RL路由的40 GB文件的平均传输时间为

$ \ text {25.2}〜s $

。OSPF和LL的是

$ \ text {63}〜s $

和

$ \ text {53.4}〜s $

，分别。

更新日期：2020-08-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文