当前位置: X-MOL 学术IEEE Comput. Intell. Mag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distributed Fusion-Based Policy Search for Fast Robot Locomotion Learning
IEEE Computational Intelligence Magazine ( IF 9 ) Pub Date : 2019-08-01 , DOI: 10.1109/mci.2019.2919364
Zhengcai Cao , Qing Xiao , Mengchu Zhou

Deep reinforcement learning methods are developed to deal with challenging locomotion control problems in a robotics domain and can achieve significant performance improvement over conventional control methods. One of their appealing advantages is model-free. In other words, agents learn a control policy completely from scratches with raw high-dimensional sensory observations. However, they often suffer from poor sample-efficiency and instability issues, which make them inapplicable to many engineering systems. This paper presents a distributed fusion-based policy search framework to accelerate robot locomotion learning processes through variance reduction and asynchronous exploration approaches. An adaptive fusion-based variance reduction technique is introduced to improve sample-efficiency. A parametric noise is added to neural network weights, which leads to efficient exploration and ensures consistency in actions. Subsequently, the fusion-based policy gradient estimator is extended to a distributed decoupled actor-critic architecture. This allows the central estimator to handle off-policy data from different actors asynchronously, which fully utilizes CPUs and GPUs to maximize data throughput. The aim of this work is to improve sample-efficiency and convergence speed of deep reinforcement learning in robot locomotion tasks. Simulation results are presented to verify the theoretical results, which show that the proposed algorithm achieves and sometimes surpasses the state-of-theart performance.

中文翻译:

用于快速机器人运动学习的基于分布式融合的策略搜索

开发了深度强化学习方法来处理机器人领域中具有挑战性的运动控制问题,并且可以实现比传统控制方法显着的性能改进。它们吸引人的优势之一是无模型。换句话说,代理完全从原始的高维感官观察中学习控制策略。然而,它们经常受到样本效率低下和不稳定问题的困扰,这使得它们不适用于许多工程系统。本文提出了一种基于分布式融合的策略搜索框架,通过方差减少和异步探索方法来加速机器人运动学习过程。引入了一种基于自适应融合的方差减少技术来提高样本效率。参数噪声被添加到神经网络权重,这导致有效的探索并确保行动的一致性。随后,基于融合的策略梯度估计器扩展到分布式解耦 actor-critic 架构。这允许中央估计器异步处理来自不同参与者的离策略数据,从而充分利用 CPU 和 GPU 来最大化数据吞吐量。这项工作的目的是提高机器人运动任务中深度强化学习的样本效率和收敛速度。仿真结果验证了理论结果,表明所提出的算法达到甚至超过了最先进的性能。这允许中央估计器异步处理来自不同参与者的离策略数据,从而充分利用 CPU 和 GPU 来最大化数据吞吐量。这项工作的目的是提高机器人运动任务中深度强化学习的样本效率和收敛速度。仿真结果验证了理论结果,表明所提出的算法达到甚至超过了最先进的性能。这允许中央估计器异步处理来自不同参与者的离策略数据,从而充分利用 CPU 和 GPU 来最大化数据吞吐量。这项工作的目的是提高机器人运动任务中深度强化学习的样本效率和收敛速度。仿真结果验证了理论结果,表明所提出的算法达到甚至超过了最先进的性能。
更新日期:2019-08-01
down
wechat
bug