当前位置: X-MOL 学术J. Manuf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimizing task scheduling in human-robot collaboration with deep multi-agent reinforcement learning
Journal of Manufacturing Systems ( IF 12.2 ) Pub Date : 2021-07-15 , DOI: 10.1016/j.jmsy.2021.07.015
Tian Yu 1 , Jing Huang 1 , Qing Chang 1
Affiliation  

Human-Robot Collaboration (HRC) presents an opportunity to improve the efficiency of manufacturing processes. However, the existing task planning approaches for HRC are still limited in many ways, e.g., co-robot encoding must rely on experts’ knowledge and the real-time task scheduling is applicable within small state-action spaces or simplified problem settings. In this paper, the HRC assembly working process is formatted into a novel chessboard setting, in which the selection of chess piece move is used to analogize to the decision making by both humans and robots in the HRC assembly working process. To optimize the completion time, a Markov game model is considered, which takes the task structure and the agent status as the state input and the overall completion time as the reward. Without experts’ knowledge, this game model is capable of seeking for correlated equilibrium policy among agents with convergency in making real-time decisions facing a dynamic environment. To improve the efficiency in finding an optimal policy of the task scheduling, a deep-Q-network (DQN) based multi-agent reinforcement learning (MARL) method is applied and compared with the Nash-Q learning, dynamic programming and the DQN-based single-agent reinforcement learning method. A height-adjustable desk assembly is used as a case study to demonstrate the effectiveness of the proposed algorithm with different number of tasks and agents.



中文翻译:

使用深度多智能体强化学习优化人机协作中的任务调度

人机协作 (HRC) 为提高制造流程的效率提供了机会。然而,现有的 HRC 任务规划方法在很多方面仍然受到限制,例如,协同机器人编码必须依赖专家的知识,实时任务调度适用于小的状态-动作空间或简化的问题设置。在本文中,HRC 装配工作过程被格式化为一种新颖的棋盘设置,其中棋子移动的选择被用来类比 HRC 装配工作过程中人类和机器人的决策。为了优化完成时间,考虑了马尔可夫博弈模型,该模型以任务结构和代理状态作为状态输入,以总完成时间作为奖励。没有专家的知识,该博弈模型能够在具有收敛性的代理之间寻求相关的均衡策略,从而做出面对动态环境的实时决策。为了提高寻找任务调度最优策略的效率,应用了一种基于深度 Q 网络(DQN)的多智能体强化学习(MARL)方法,并与 Nash-Q 学习、动态规划和 DQN-基于单智能体的强化学习方法。高度可调的办公桌组件用作案例研究,以证明所提出的算法在不同数量的任务和代理下的有效性。应用基于深度 Q 网络 (DQN) 的多智能体强化学习 (MARL) 方法并与 Nash-Q 学习、动态规划和基于 DQN 的单智能体强化学习方法进行比较。高度可调的办公桌组件用作案例研究,以证明所提出的算法在不同数量的任务和代理下的有效性。应用基于深度 Q 网络 (DQN) 的多智能体强化学习 (MARL) 方法并与 Nash-Q 学习、动态规划和基于 DQN 的单智能体强化学习方法进行比较。高度可调的办公桌组件用作案例研究,以证明所提出的算法在不同数量的任务和代理下的有效性。

更新日期:2021-07-15
down
wechat
bug