当前位置: X-MOL 学术Appl. Soft Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A multi-robot path-planning algorithm for autonomous navigation using meta-reinforcement learning based on transfer learning
Applied Soft Computing ( IF 7.2 ) Pub Date : 2021-06-21 , DOI: 10.1016/j.asoc.2021.107605
Shuhuan Wen , Zeteng Wen , Di Zhang , Hong Zhang , Tao Wang

The adaptability of multi-robot systems in complex environments is a hot topic. Aiming at static and dynamic obstacles in complex environments, this paper presents dynamic proximal meta policy optimization with covariance matrix adaptation evolutionary strategies (dynamic-PMPO-CMA) to avoid obstacles and realize autonomous navigation. Firstly, we propose dynamic proximal policy optimization with covariance matrix adaptation evolutionary strategies (dynamic-PPO-CMA) based on original proximal policy optimization (PPO) to obtain a valid policy of obstacles avoidance. The simulation results show that the proposed dynamic-PPO-CMA can avoid obstacles and reach the designated target position successfully. Secondly, in order to improve the adaptability of multi-robot systems in different environments, we integrate meta-learning with dynamic-PPO-CMA to form the dynamic-PMPO-CMA algorithm. In training process, we use the proposed dynamic-PMPO-CMA to train robots to learn multi-task policy. Finally, in testing process, transfer learning is introduced to the proposed dynamic-PMPO-CMA algorithm. The trained parameters of meta policy are transferred to new environments and regarded as the initial parameters. The simulation results show that the proposed algorithm can have faster convergence rate and arrive the destination more quickly than PPO, PMPO and dynamic-PPO-CMA.



中文翻译:

基于迁移学习的元强化学习自主导航多机器人路径规划算法

多机器人系统在复杂环境中的适应性是一个热门话题。针对复杂环境中的静态和动态障碍,本文提出了动态近端元策略优化与协方差矩阵自适应进化策略(dynamic-PMPO-CMA),以避开障碍并实现自主导航。首先,我们在原始近端策略优化(PPO)的基础上提出动态近端策略优化协方差矩阵自适应进化策略(dynamic-PPO-CMA),以获得有效的避障策略。仿真结果表明,所提出的动态PPO-CMA可以避开障碍物并成功到达指定的目标位置。其次,为了提高多机器人系统在不同环境下的适应性,我们将元学习与动态 PPO-CMA 相结合,形成动态 PMPO-CMA 算法。在训练过程中,我们使用提出的 dynamic-PMPO-CMA 来训练机器人学习多任务策略。最后,在测试过程中,将迁移学习引入到所提出的动态-PMPO-CMA 算法中。元策略的训练参数被转移到新环境并被视为初始参数。仿真结果表明,与PPO、PMPO和动态PPO-CMA算法相比,该算法具有更快的收敛速度和更快的到达目的地速度。元策略的训练参数被转移到新环境并被视为初始参数。仿真结果表明,与PPO、PMPO和动态PPO-CMA算法相比,该算法具有更快的收敛速度和更快的到达目的地速度。元策略的训练参数被转移到新环境并被视为初始参数。仿真结果表明,与PPO、PMPO和动态PPO-CMA算法相比,该算法具有更快的收敛速度和更快的到达目的地速度。

更新日期:2021-06-25
down
wechat
bug