Deep Implicit Coordination Graphs for Multi-agent Reinforcement Learning,arXiv - CS - Multiagent Systems

当前位置： X-MOL 学术 › arXiv.cs.MA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep Implicit Coordination Graphs for Multi-agent Reinforcement Learning
arXiv - CS - Multiagent Systems Pub Date : 2020-06-19 , DOI: arxiv-2006.11438
Sheng Li, Jayesh K. Gupta, Peter Morales, Ross Allen, Mykel J. Kochenderfer

Multi-agent reinforcement learning (MARL) requires coordination to efficiently solve certain tasks. Fully centralized control is often infeasible in such domains due to the size of joint action spaces. Coordination graph based formalization allows reasoning about the joint action based on the structure of interactions. However, they often require domain expertise in their design. This paper introduces the deep implicit coordination graph (DICG) architecture for such scenarios. DICG consists of a module for inferring the dynamic coordination graph structure which is then used by a graph neural network based module to learn to implicitly reason about the joint actions or values. DICG allows learning the tradeoff between full centralization and decentralization via standard actor-critic methods to significantly improve coordination for domains with large number of agents. We apply DICG to both centralized-training-centralized-execution and centralized-training-decentralized-execution regimes. We demonstrate that DICG solves the relative overgeneralization pathology in predatory-prey tasks as well as outperforms various MARL baselines on the challenging StarCraft II Multi-agent Challenge (SMAC) and traffic junction environments.

中文翻译：

多智能体强化学习的深度隐式协调图

多智能体强化学习 (MARL) 需要协调才能有效地解决某些任务。由于联合行动空间的大小，完全集中控制在这些领域通常是不可行的。基于协调图的形式化允许基于交互结构对联合动作进行推理。然而，他们的设计通常需要领域专业知识。本文介绍了针对此类场景的深度隐式协调图 (DICG) 架构。DICG 包含一个用于推断动态协调图结构的模块，然后由基于图神经网络的模块使用该模块来学习隐式推理联合动作或值。DICG 允许通过标准的 actor-critic 方法学习完全中心化和去中心化之间的权衡，以显着改善具有大量代理的域的协调。我们将 DICG 应用于集中训练集中执行和集中训练分散执行机制。我们证明 DICG 解决了捕食-猎物任务中的相对过度泛化病理，并在具有挑战性的星际争霸 II 多智能体挑战 (SMAC) 和交通枢纽环境中优于各种 MARL 基线。

更新日期：2020-06-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文