当前位置: X-MOL 学术Int. J. Intell. Robot. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A sample efficient model-based deep reinforcement learning algorithm with experience replay for robot manipulation
International Journal of Intelligent Robotics and Applications ( IF 2.1 ) Pub Date : 2020-05-16 , DOI: 10.1007/s41315-020-00135-2
Cheng Zhang , Liang Ma , Alexander Schmitz

For robot manipulation, reinforcement learning has provided an effective end to end approach in controlling the complicated dynamic system. Model-free reinforcement learning methods ignore the model of system dynamics and are limited to simple behavior control. By contrast, model-based methods can quickly reach optimal trajectory planning by building a dynamic system model. However, it is not easy to build an accurate and efficient system model with high generalization ability, especially when facing complex dynamic system and various manipulation tasks. Furthermore, when the rewards provided by the environment are sparse, the agent will also lose effective guidance and fail to optimize the policy efficiently, which results in considerably decreased sample efficiency. In this paper, a model-based deep reinforcement learning algorithm, in which a deep neural network model is utilized to simulate the system dynamics, is designed for robot manipulation. The proposed deep neural network model is robust enough to deal with complex control tasks and possesses the generalization ability. Moreover, a curiosity-based experience replay method is incorporated to solve the sparse reward problem and improve the sample efficiency in reinforcement learning. The agent who manipulates a robotic hand, will be encouraged to explore optimal trajectories according to the failure experience. Simulation experiment results show great effectiveness of the proposed method. Various manipulation tasks are achieved successfully in such a complex dynamic system and the sample efficiency gets improved even in a sparse reward environment, as the learning time gets reduced considerably.

中文翻译:

一种基于样本高效基于模型的深度强化学习算法,具有经验重播,可用于机器人操纵

对于机器人操纵,强化学习为控制复杂的动态系统提供了一种有效的端到端方法。无模型的强化学习方法会忽略系统动力学模型,并且仅限于简单的行为控制。相比之下,基于模型的方法可以通过构建动态系统模型快速达到最佳的轨迹规划。然而,建立具有高泛化能力的准确高效的系统模型并不容易,尤其是在面对复杂的动态系统和各种操纵任务时。此外,当环境提供的奖励稀疏时,代理也会失去有效的指导,并且无法有效地优化策略,从而导致样本效率大大降低。本文基于模型的深度强化学习算法,设计了一种用于机器人操纵的深层神经网络模型来模拟系统动力学。所提出的深度神经网络模型具有足够的鲁棒性以应付复杂的控制任务,并具有泛化能力。此外,结合了基于好奇心的体验重播方法,以解决稀疏奖励问题并提高强化学习中的样本效率。将鼓励操作机器人的代理商根据失败的经验来探索最佳轨迹。仿真实验结果表明了该方法的有效性。在如此复杂的动态系统中,成功完成了各种操作任务,并且即使在稀疏的奖励环境中,采样效率也得到了提高,因为学习时间大大减少了。
更新日期:2020-05-16
down
wechat
bug