Learning to Communicate Using Counterfactual Reasoning,arXiv - CS - Multiagent Systems

当前位置： X-MOL 学术 › arXiv.cs.MA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning to Communicate Using Counterfactual Reasoning
arXiv - CS - Multiagent Systems Pub Date : 2020-06-12 , DOI: arxiv-2006.07200
Simon Vanneste, Astrid Vanneste, Siegfried Mercelis and Peter Hellinckx

This paper introduces a new approach for multi-agent communication learning called multi-agent counterfactual communication (MACC) learning. Many real-world problems are currently tackled using multi-agent techniques. However, in many of these tasks the agents do not observe the full state of the environment but only a limited observation. This absence of knowledge about the full state makes completing the objectives significantly more complex or even impossible. The key to this problem lies in sharing observation information between agents or learning how to communicate the essential data. In this paper we present a novel multi-agent communication learning approach called MACC. It addresses the partial observability problem of the agents. MACC lets the agent learn the action policy and the communication policy simultaneously. We focus on decentralized Markov Decision Processes (Dec-MDP), where the agents have joint observability. This means that the full state of the environment can be determined using the observations of all agents. MACC uses counterfactual reasoning to train both the action and the communication policy. This allows the agents to anticipate on how other agents will react to certain messages and on how the environment will react to certain actions, allowing them to learn more effective policies. MACC uses actor-critic with a centralized critic and decentralized actors. The critic is used to calculate an advantage for both the action and communication policy. We demonstrate our method by applying it on the Simple Reference Particle environment of OpenAI and a MNIST game. Our results are compared with a communication and non-communication baseline. These experiments demonstrate that MACC is able to train agents for each of these problems with effective communication policies.

中文翻译：

学习使用反事实推理进行交流

本文介绍了一种新的多智能体通信学习方法，称为多智能体反事实通信 (MACC) 学习。目前，许多现实世界的问题都是使用多代理技术来解决的。然而，在许多这些任务中，代理并没有观察到环境的完整状态，而只是进行了有限的观察。缺乏关于完整状态的知识使得完成目标变得更加复杂甚至不可能。这个问题的关键在于在智能体之间共享观察信息或学习如何交流基本数据。在本文中，我们提出了一种称为 MACC 的新型多智能体通信学习方法。它解决了代理的部分可观察性问题。MACC 让代理同时学习动作策略和通信策略。我们专注于分散的马尔可夫决策过程（Dec-MDP），其中代理具有联合可观察性。这意味着可以使用所有代理的观察来确定环境的完整状态。MACC 使用反事实推理来训练行动和沟通策略。这允许代理预测其他代理将如何对某些消息做出反应以及环境将如何对某些动作做出反应，从而使他们能够学习更有效的策略。MACC 使用 actor-critic 与集中式评论家和分散式演员。评论家用于计算行动和沟通策略的优势。我们通过将其应用于 OpenAI 的简单参考粒子环境和 MNIST 游戏来演示我们的方法。我们的结果与通信和非通信基线进行了比较。

更新日期：2020-06-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>