当前位置: X-MOL 学术Adv. Eng. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep reinforcement learning-based safe interaction for industrial human-robot collaboration using intrinsic reward function
Advanced Engineering Informatics ( IF 8.0 ) Pub Date : 2021-07-17 , DOI: 10.1016/j.aei.2021.101360
Quan Liu 1, 2 , Zhihao Liu 1, 2, 3 , Bo Xiong 1, 2 , Wenjun Xu 1, 2 , Yang Liu 4, 5
Affiliation  

Aiming at human-robot collaboration in manufacturing, the operator's safety is the primary issue during the manufacturing operations. This paper presents a deep reinforcement learning approach to realize the real-time collision-free motion planning of an industrial robot for human-robot collaboration. Firstly, the safe human-robot collaboration manufacturing problem is formulated into a Markov decision process, and the mathematical expression of the reward function design problem is given. The goal is that the robot can autonomously learn a policy to reduce the accumulated risk and assure the task completion time during human-robot collaboration. To transform our optimization object into a reward function to guide the robot to learn the expected behaviour, a reward function optimizing approach based on the deterministic policy gradient is proposed to learn a parameterized intrinsic reward function. The reward function for the agent to learn the policy is the sum of the intrinsic reward function and the extrinsic reward function. Then, a deep reinforcement learning algorithm intrinsic reward-deep deterministic policy gradient (IRDDPG), which is the combination of the DDPG algorithm and the reward function optimizing approach, is proposed to learn the expected collision avoidance policy. Finally, the proposed algorithm is tested in a simulation environment, and the results show that the industrial robot can learn the expected policy to achieve the safety assurance for industrial human-robot collaboration without missing the original target. Moreover, the reward function optimizing approach can help make up for the designed reward function and improve policy performance.



中文翻译:

基于深度强化学习的工业人机协作安全交互使用内在奖励函数

针对制造中的人机协作,操作员的安全是制造操作中的首要问题。本文提出了一种深度强化学习方法,以实现工业机器人的实时无碰撞运动规划,以实现人机协作。首先,将安全人机协作制造问题表述为马尔可夫决策过程,给出奖励函数设计问题的数学表达式。目标是机器人可以自主学习策略以减少累积风险并确保人机协作期间的任务完成时间。将我们的优化对象转化为奖励函数以引导机器人学习预期行为,提出了一种基于确定性策略梯度的奖励函数优化方法来学习参数化的内在奖励函数。代理学习策略的奖励函数是内在奖励函数和外在奖励函数的总和。然后,提出了一种深度强化学习算法内在奖励-深度确定性策略梯度(IRDDPG),它是 DDPG 算法和奖励函数优化方法的结合,用于学习预期的碰撞避免策略。最后,在仿真环境中对所提出的算法进行了测试,结果表明工业机器人可以学习预期的策略,以实现工业人机协作的安全保证,而不会错过原始目标。而且,

更新日期:2021-07-18
down
wechat
bug