当前位置: X-MOL 学术Auton. Agent. Multi-Agent Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning multi-agent communication with double attentional deep reinforcement learning
Autonomous Agents and Multi-Agent Systems ( IF 1.9 ) Pub Date : 2020-03-25 , DOI: 10.1007/s10458-020-09455-w
Hangyu Mao , Zhengchao Zhang , Zhen Xiao , Zhibo Gong , Yan Ni

Communication is a critical factor for the big multi-agent world to stay organized and productive. Recently, Deep Reinforcement Learning (DRL) has been adopted to learn the communication among multiple intelligent agents. However, in terms of the DRL setting, the increasing number of communication messages introduces two problems: (1) there are usually some redundant messages; (2) even in the case that all messages are necessary, how to process a large number of messages in an efficient way remains a big challenge. In this paper, we propose a DRL method named Double Attentional Actor-Critic Message Processor (DAACMP) to jointly address these two problems. Specifically, DAACMP adopts two attention mechanisms. The first one is embedded in the actor part, such that it can select the important messages from all communication messages adaptively. The other one is embedded in the critic part so that all important messages can be processed efficiently. We evaluate DAACMP on three multi-agent tasks with seven different settings. Results show that DAACMP not only outperforms several state-of-the-art methods but also achieves better scalability in all tasks. Furthermore, we conduct experiments to reveal some insights about the proposed attention mechanisms and the learned policies.

中文翻译:

通过双注意力深度强化学习来学习多主体沟通

沟通是大型多代理世界保持组织化和生产力的关键因素。最近,深度强化学习(DRL)已被采用来学习多个智能代理之间的通信。然而,就DRL设置而言,通信消息数量的增加带来了两个问题:(1)通常存在一些冗余消息;(2)即使在所有消息都是必需的情况下,如何有效地处理大量消息仍然是一个巨大的挑战。在本文中,我们提出了一种称为“双注意演员-关键消息处理器”(DAACMP)的DRL方法。共同解决这两个问题。具体来说,DAACMP采用两种注意机制。第一个嵌入在参与者部分中,这样它就可以从所有通信消息中自适应地选择重要消息。另一个嵌入评论器部分,以便可以有效处理所有重要消息。我们在具有七个不同设置的三个多代理任务上评估DAACMP。结果表明,DAACMP不仅优于几种最先进的方法,而且在所有任务中都实现了更好的可伸缩性。此外,我们进行实验以揭示有关拟议的注意力机制和所学政策的一些见解。
更新日期:2020-03-25
down
wechat
bug