当前位置: X-MOL 学术Integr. Comput. Aided Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploring communication protocols and centralized critics in multi-agent deep learning
Integrated Computer-Aided Engineering ( IF 5.8 ) Pub Date : 2020-05-18 , DOI: 10.3233/ica-200631
David Simões 1 , Nuno Lau 1 , Luís Paulo Reis 2
Affiliation  

Tackling multi-agent environments where each agent has a local limited observation of the global state is a non-trivial task that often requires hand-tuned solutions. A team of agents coordinating in such scenarios must handle the complex underlying environment, while each agent only has partial knowledge about the environment. Deep reinforcement learning has been shown to achieve super-human performance in single-agent environments, and has since been adapted to the multi-agent paradigm. This paper proposes A3C3, a multi-agent deep learning algorithm, where agents are evaluated by a centralized referee during the learning phase, but remain independent from each other in actual execution. This referee’s neural network is augmented with a permutation invariance architecture to increase its scalability to large teams. A3C3 also allows agents to learn communication protocols with which agents share relevant information to their team members, allowing them to overcome their limited knowledge, and achieve coordination. A3C3 and its permutation invariant augmentation is evaluated in multiple multi-agent test-beds, which include partially-observable scenarios, swarm environments, and complex 3D soccer simulations.

中文翻译:

在多主体深度学习中探索通信协议和集中批评者

解决每个代理在本地对全局状态有局部限制的多代理环境是一项艰巨的任务,通常需要手动调整解决方案。在这种情况下进行协调的一组代理必须处理复杂的基础环境,而每个代理仅对环境有部分了解。深度强化学习已被证明可以在单代理环境中实现超人的性能,并且自那时以来已经适应了多代理范例。本文提出了一种A3C3,一种多主体深度学习算法,其中主体在学习阶段由中央裁判评估,但在实际执行中彼此独立。裁判的神经网络增加了排列不变性体系结构,以提高其对大型团队的可伸缩性。A3C3还允许座席学习通信协议,通过座席,座席可以与团队成员共享相关信息,从而克服了有限的知识并实现了协调。在多个多主体测试平台上评估了A3C3及其置换不变性增强,其中包括部分可观察的场景,群体环境以及复杂的3D足球模拟。
更新日期:2020-06-30
down
wechat
bug