当前位置:
X-MOL 学术
›
arXiv.cs.RO
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep Reinforcement Learning with Stage Incentive Mechanism for Robotic Trajectory Planning
arXiv - CS - Robotics Pub Date : 2020-09-25 , DOI: arxiv-2009.12068 Jin Yang, Gang Peng
arXiv - CS - Robotics Pub Date : 2020-09-25 , DOI: arxiv-2009.12068 Jin Yang, Gang Peng
To improve the efficiency of deep reinforcement learning (DRL) based methods
for robot manipulator trajectory planning in random working environment.
Different from the traditional sparse reward function, we present three dense
reward functions in this paper. Firstly, posture reward function is proposed to
accelerate the learning process with a more reasonable trajectory by modeling
the distance and direction constraints, which can reduce the blindness of
exploration. Secondly, to improve the stability, a reward function at stride
reward is proposed by modeling the distance and movement distance of joints
constraints, it can make the learning process more stable. In order to further
improve learning efficiency, we are inspired by the cognitive process of human
behavior and propose a stage incentive mechanism, including hard stage
incentive reward function and soft stage incentive reward function. Extensive
experiments show that the soft stage incentive reward function proposed is able
to improve convergence rate by up to 46.9% with the state-of-the-art DRL
methods. The percentage increase in convergence mean reward is 4.4%~15.5% and
the percentage decreases with respect to standard deviation by 21.9%~63.2%. In
the evaluation, the success rate of trajectory planning for robot manipulator
is up to 99.6%.
中文翻译:
基于阶段激励机制的机器人轨迹规划深度强化学习
提高基于深度强化学习 (DRL) 的机器人机械手轨迹规划方法在随机工作环境中的效率。与传统的稀疏奖励函数不同,本文提出了三个密集奖励函数。首先,提出姿势奖励函数,通过对距离和方向约束进行建模,以更合理的轨迹加速学习过程,减少探索的盲目性。其次,为了提高稳定性,通过对关节约束的距离和运动距离进行建模,提出了步幅奖励的奖励函数,它可以使学习过程更加稳定。为了进一步提高学习效率,我们受人类行为的认知过程启发,提出阶段激励机制,包括硬阶段激励奖励函数和软阶段激励奖励函数。大量实验表明,所提出的软阶段激励奖励函数能够使用最先进的 DRL 方法将收敛速度提高多达 46.9%。收敛平均奖励增加的百分比为 4.4%~15.5%,相对于标准差的百分比减少了 21.9%~63.2%。在评估中,机器人机械手轨迹规划的成功率高达99.6%。5%,百分比相对于标准偏差下降 21.9%~63.2%。在评估中,机器人机械手轨迹规划的成功率高达99.6%。5%,百分比相对于标准偏差下降 21.9%~63.2%。在评估中,机器人机械手轨迹规划的成功率高达99.6%。
更新日期:2020-09-28
中文翻译:
基于阶段激励机制的机器人轨迹规划深度强化学习
提高基于深度强化学习 (DRL) 的机器人机械手轨迹规划方法在随机工作环境中的效率。与传统的稀疏奖励函数不同,本文提出了三个密集奖励函数。首先,提出姿势奖励函数,通过对距离和方向约束进行建模,以更合理的轨迹加速学习过程,减少探索的盲目性。其次,为了提高稳定性,通过对关节约束的距离和运动距离进行建模,提出了步幅奖励的奖励函数,它可以使学习过程更加稳定。为了进一步提高学习效率,我们受人类行为的认知过程启发,提出阶段激励机制,包括硬阶段激励奖励函数和软阶段激励奖励函数。大量实验表明,所提出的软阶段激励奖励函数能够使用最先进的 DRL 方法将收敛速度提高多达 46.9%。收敛平均奖励增加的百分比为 4.4%~15.5%,相对于标准差的百分比减少了 21.9%~63.2%。在评估中,机器人机械手轨迹规划的成功率高达99.6%。5%,百分比相对于标准偏差下降 21.9%~63.2%。在评估中,机器人机械手轨迹规划的成功率高达99.6%。5%,百分比相对于标准偏差下降 21.9%~63.2%。在评估中,机器人机械手轨迹规划的成功率高达99.6%。