Option-Critic in Cooperative Multi-agent Systems,arXiv - CS - Multiagent Systems

当前位置： X-MOL 学术 › arXiv.cs.MA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Option-Critic in Cooperative Multi-agent Systems
arXiv - CS - Multiagent Systems Pub Date : 2019-11-28 , DOI: arxiv-1911.12825
Jhelum Chakravorty, Nadeem Ward, Julien Roy, Maxime Chevalier-Boisvert, Sumana Basu, Andrei Lupu, Doina Precup

In this paper, we investigate learning temporal abstractions in cooperative multi-agent systems, using the options framework (Sutton et al, 1999). First, we address the planning problem for the decentralized POMDP represented by the multi-agent system, by introducing a \emph{common information approach}. We use the notion of \emph{common beliefs} and broadcasting to solve an equivalent centralized POMDP problem. Then, we propose the Distributed Option Critic (DOC) algorithm, which uses centralized option evaluation and decentralized intra-option improvement. We theoretically analyze the asymptotic convergence of DOC and build a new multi-agent environment to demonstrate its validity. Our experiments empirically show that DOC performs competitively against baselines and scales with the number of agents.

中文翻译：

协作多智能体系统中的 Option-Critic

在本文中，我们使用选项框架（Sutton 等人，1999）研究了协作多智能体系统中的学习时间抽象。首先，我们通过引入\emph {通用信息方法}来解决由多代理系统表示的分散式 POMDP 的规划问题。我们使用\emph {common信念}和广播的概念来解决等效的集中式POMDP问题。然后，我们提出了分布式期权批评 (DOC) 算法，该算法使用集中式期权评估和分散式期权内改进。我们从理论上分析了 DOC 的渐近收敛性，并构建了一个新的多智能体环境来证明其有效性。我们的实验凭经验表明，DOC 与基线和规模的代理数量相比具有竞争力。

更新日期：2020-03-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文