Deep Reinforcement Learning with Stage Incentive Mechanism for Robotic Trajectory Planning,arXiv - CS - Robotics

当前位置： X-MOL 学术 › arXiv.cs.RO › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep Reinforcement Learning with Stage Incentive Mechanism for Robotic Trajectory Planning
arXiv - CS - Robotics Pub Date : 2020-09-25 , DOI: arxiv-2009.12068
Jin Yang, Gang Peng

To improve the efficiency of deep reinforcement learning (DRL) based methods for robot manipulator trajectory planning in random working environment. Different from the traditional sparse reward function, we present three dense reward functions in this paper. Firstly, posture reward function is proposed to accelerate the learning process with a more reasonable trajectory by modeling the distance and direction constraints, which can reduce the blindness of exploration. Secondly, to improve the stability, a reward function at stride reward is proposed by modeling the distance and movement distance of joints constraints, it can make the learning process more stable. In order to further improve learning efficiency, we are inspired by the cognitive process of human behavior and propose a stage incentive mechanism, including hard stage incentive reward function and soft stage incentive reward function. Extensive experiments show that the soft stage incentive reward function proposed is able to improve convergence rate by up to 46.9% with the state-of-the-art DRL methods. The percentage increase in convergence mean reward is 4.4%~15.5% and the percentage decreases with respect to standard deviation by 21.9%~63.2%. In the evaluation, the success rate of trajectory planning for robot manipulator is up to 99.6%.

中文翻译：

基于阶段激励机制的机器人轨迹规划深度强化学习

提高基于深度强化学习 (DRL) 的机器人机械手轨迹规划方法在随机工作环境中的效率。与传统的稀疏奖励函数不同，本文提出了三个密集奖励函数。首先，提出姿势奖励函数，通过对距离和方向约束进行建模，以更合理的轨迹加速学习过程，减少探索的盲目性。其次，为了提高稳定性，通过对关节约束的距离和运动距离进行建模，提出了步幅奖励的奖励函数，它可以使学习过程更加稳定。为了进一步提高学习效率，我们受人类行为的认知过程启发，提出阶段激励机制，包括硬阶段激励奖励函数和软阶段激励奖励函数。大量实验表明，所提出的软阶段激励奖励函数能够使用最先进的 DRL 方法将收敛速度提高多达 46.9%。收敛平均奖励增加的百分比为 4.4%~15.5%，相对于标准差的百分比减少了 21.9%~63.2%。在评估中，机器人机械手轨迹规划的成功率高达99.6%。5%，百分比相对于标准偏差下降 21.9%~63.2%。在评估中，机器人机械手轨迹规划的成功率高达99.6%。5%，百分比相对于标准偏差下降 21.9%~63.2%。在评估中，机器人机械手轨迹规划的成功率高达99.6%。

更新日期：2020-09-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>