当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multiagent Meta-Reinforcement Learning for Adaptive Multipath Routing Optimization
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.4 ) Pub Date : 2021-04-21 , DOI: 10.1109/tnnls.2021.3070584
Long Chen 1 , Bin Hu 2 , Zhi-Hong Guan 1 , Lian Zhao 3 , Xuemin Shen 4
Affiliation  

In this article, we investigate the routing problem of packet networks through multiagent reinforcement learning (RL), which is a very challenging topic in distributed and autonomous networked systems. In specific, the routing problem is modeled as a networked multiagent partially observable Markov decision process (MDP). Since the MDP of a network node is not only affected by its neighboring nodes’ policies but also the network traffic demand, it becomes a multitask learning problem. Inspired by recent success of RL and metalearning, we propose two novel model-free multiagent RL algorithms, named multiagent proximal policy optimization (MAPPO) and multiagent metaproximal policy optimization (meta-MAPPO), to optimize the network performances under fixed and time-varying traffic demand, respectively. A practicable distributed implementation framework is designed based on the separability of exploration and exploitation in training MAPPO. Compared with the existing routing optimization policies, our simulation results demonstrate the excellent performances of the proposed algorithms.

中文翻译:

自适应多路径路由优化的多智能体元强化学习

在本文中,我们通过多智能体强化学习 (RL) 研究分组网络的路由问题,这在分布式和自治网络系统中是一个非常具有挑战性的主题。具体来说,路由问题被建模为网络多智能体部分可观察马尔可夫决策过程(MDP)。由于网络节点的 MDP 不仅受其相邻节点策略的影响,而且还受到网络流量需求的影响,因此它成为一个多任务学习问题。受最近 RL 和元学习成功的启发,我们提出了两种新的无模型多智能体 RL 算法,称为多智能体近端策略优化 (MAPPO) 和多智能体近端策略优化 (meta-MAPPO),以优化固定和时变下的网络性能交通需求,分别。基于训练 MAPPO 中探索和开发的可分离性,设计了一个实用的分布式实现框架。与现有的路由优化策略相比,我们的仿真结果证明了所提出算法的优异性能。
更新日期:2021-04-21
down
wechat
bug