当前位置: X-MOL 学术arXiv.cs.GT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-Agent Common Knowledge Reinforcement Learning
arXiv - CS - Computer Science and Game Theory Pub Date : 2018-10-27 , DOI: arxiv-1810.11702
Christian A. Schroeder de Witt, Jakob N. Foerster, Gregory Farquhar, Philip H. S. Torr, Wendelin Boehmer, Shimon Whiteson

Cooperative multi-agent reinforcement learning often requires decentralised policies, which severely limit the agents' ability to coordinate their behaviour. In this paper, we show that common knowledge between agents allows for complex decentralised coordination. Common knowledge arises naturally in a large number of decentralised cooperative multi-agent tasks, for example, when agents can reconstruct parts of each others' observations. Since agents an independently agree on their common knowledge, they can execute complex coordinated policies that condition on this knowledge in a fully decentralised fashion. We propose multi-agent common knowledge reinforcement learning (MACKRL), a novel stochastic actor-critic algorithm that learns a hierarchical policy tree. Higher levels in the hierarchy coordinate groups of agents by conditioning on their common knowledge, or delegate to lower levels with smaller subgroups but potentially richer common knowledge. The entire policy tree can be executed in a fully decentralised fashion. As the lowest policy tree level consists of independent policies for each agent, MACKRL reduces to independently learnt decentralised policies as a special case. We demonstrate that our method can exploit common knowledge for superior performance on complex decentralised coordination tasks, including a stochastic matrix game and challenging problems in StarCraft II unit micromanagement.

中文翻译:

多智能体公共知识强化学习

协作式多智能体强化学习通常需要分散的策略,这严重限制了智能体协调其行为的能力。在本文中,我们展示了代理之间的共同知识允许复杂的分散协调。公共知识在大量分散的协作多智能体任务中自然产生,例如,当智能体可以重建彼此观察的部分时。由于代理独立地同意他们的共同知识,他们可以以完全分散的方式执行以这些知识为条件的复杂的协调策略。我们提出了多智能体公共知识强化学习 (MACKRL),这是一种新的随机演员-评论家算法,可以学习分层策略树。层次结构中的较高级别通过以他们的共同知识为条件来协调代理组,或者委托给具有较小子组但可能更丰富的公共知识的较低级别。整个策略树可以以完全分散的方式执行。由于最低策略树级别由每个代理的独立策略组成,因此 MACKRL 简化为独立学习的分散策略作为特例。我们证明我们的方法可以利用常识在复杂的分散协调任务中获得卓越的性能,包括随机矩阵游戏和星际争霸 II 单元微管理中的挑战性问题。由于最低策略树级别由每个代理的独立策略组成,因此 MACKRL 简化为独立学习的分散策略作为特例。我们证明我们的方法可以利用常识在复杂的分散协调任务中获得卓越的性能,包括随机矩阵游戏和星际争霸 II 单元微管理中的挑战性问题。由于最低策略树级别由每个代理的独立策略组成,因此 MACKRL 简化为独立学习的分散策略作为特例。我们证明我们的方法可以利用常识在复杂的分散协调任务中获得卓越的性能,包括随机矩阵游戏和星际争霸 II 单元微管理中的挑战性问题。
更新日期:2020-01-14
down
wechat
bug