当前位置: X-MOL 学术arXiv.cs.MA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Option-Critic in Cooperative Multi-agent Systems
arXiv - CS - Multiagent Systems Pub Date : 2019-11-28 , DOI: arxiv-1911.12825
Jhelum Chakravorty, Nadeem Ward, Julien Roy, Maxime Chevalier-Boisvert, Sumana Basu, Andrei Lupu, Doina Precup

In this paper, we investigate learning temporal abstractions in cooperative multi-agent systems, using the options framework (Sutton et al, 1999). First, we address the planning problem for the decentralized POMDP represented by the multi-agent system, by introducing a \emph{common information approach}. We use the notion of \emph{common beliefs} and broadcasting to solve an equivalent centralized POMDP problem. Then, we propose the Distributed Option Critic (DOC) algorithm, which uses centralized option evaluation and decentralized intra-option improvement. We theoretically analyze the asymptotic convergence of DOC and build a new multi-agent environment to demonstrate its validity. Our experiments empirically show that DOC performs competitively against baselines and scales with the number of agents.

中文翻译:

协作多智能体系统中的 Option-Critic

在本文中,我们使用选项框架(Sutton 等人,1999)研究了协作多智能体系统中的学习时间抽象。首先,我们通过引入\emph {通用信息方法}来解决由多代理系统表示的分散式 POMDP 的规划问题。我们使用\emph {common信念}和广播的概念来解决等效的集中式POMDP问题。然后,我们提出了分布式期权批评 (DOC) 算法,该算法使用集中式期权评估和分散式期权内改进。我们从理论上分析了 DOC 的渐近收敛性,并构建了一个新的多智能体环境来证明其有效性。我们的实验凭经验表明,DOC 与基线和规模的代理数量相比具有竞争力。
更新日期:2020-03-23
down
wechat
bug