当前位置: X-MOL 学术arXiv.cs.MA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Human and Multi-Agent collaboration in a human-MARL teaming framework
arXiv - CS - Multiagent Systems Pub Date : 2020-06-12 , DOI: arxiv-2006.07301
Neda Navidi, Francois Chabot, Sagar Kurandwad, Irv Lustigman, Vincent Robert, Gregory Szriftgiser, Andrea Schuch

Collaborative multi-agent reinforcement learning (MARL) as a specific category of reinforcement learning provides effective results with agents learning from their observations, received rewards, and internal interactions between agents. However, centralized learning methods with a joint global policy in a highly dynamic environment present unique challenges in dealing with large amounts of information. This study proposes two innovative solutions to address the complexities of a collaboration between a human and multiple reinforcement learning (RL)-based agents (referred to thereafter as Human-MARL teaming) where the goals pursued cannot be achieved by a human alone or agents alone. The first innovation is the introduction of a new open-source MARL framework, called COGMENT, to unite humans and agents in real-time complex dynamic systems and efficiently leverage their interactions as a source of learning. The second innovation is our proposal of a new hybrid MARL method, named Dueling Double Deep Q learning MADDPG (D3-MADDPG) to allow agents to train decentralized policies parallelly in a joint centralized policy. This method can solve the overestimation problem in Q-learning methods of value-based MARL. We demonstrate these innovations by using a designed real-time environment with unmanned aerial vehicles driven by RL agents, collaborating with a human to fight fires. The team of RL agent drones autonomously look for fire seats and the human pilot douses the fires. The results of this study show that the proposed collaborative paradigm and the open-source framework leads to significant reductions in both human effort and exploration costs. Also, the results of the proposed hybrid MARL method shows that it effectively improves the learning process to achieve more reliable Q-values for each action, by decoupling the estimation between state value and advantage value.

中文翻译:

人类-MARL 团队框架中的人类和多智能体协作

协作式多智能体强化学习 (MARL) 作为强化学习的一个特定类别,通过智能体从他们的观察、获得的奖励和智能体之间的内部交互中学习,提供了有效的结果。然而,在高度动态的环境中具有联合全球政策的集中学习方法在处理大量信息时提出了独特的挑战。本研究提出了两种创新的解决方案,以解决人类和基于多个强化学习 (RL) 的代理(此后称为 Human-MARL 团队)之间协作的复杂性,其中所追求的目标无法仅靠人类或代理来实现. 第一个创新是引入了一个新的开源 MARL 框架,称为 COGMENT,在实时复杂的动态系统中将人类和代理联合起来,并有效地利用他们的交互作为学习的来源。第二个创新是我们提出了一种新的混合 MARL 方法,名为 Duling Double Deep Q learning MADDPG (D3-MADDPG),允许代理在联合集中策略中并行训练分散策略。该方法可以解决基于值的MARL的Q-learning方法中的高估问题。我们通过使用由 RL 代理驱动的无人驾驶飞行器设计的实时环境来展示这些创新,与人类合作灭火。RL 代理无人机团队自主寻找消防座椅,人类飞行员灭火。这项研究的结果表明,提议的协作范式和开源框架显着减少了人力和探索成本。此外,所提出的混合 MARL 方法的结果表明,它通过解耦状态值和优势值之间的估计,有效地改进了学习过程,为每个动作实现了更可靠的 Q 值。
更新日期:2020-06-15
down
wechat
bug