当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving coordination in small-scale multi-agent deep reinforcement learning through memory-driven communication
Machine Learning ( IF 4.3 ) Pub Date : 2020-01-23 , DOI: 10.1007/s10994-019-05864-5
Emanuele Pesce , Giovanni Montana

Deep reinforcement learning algorithms have recently been used to train multiple interacting agents in a centralised manner whilst keeping their execution decentralised. When the agents can only acquire partial observations and are faced with tasks requiring coordination and synchronisation skills, inter-agent communication plays an essential role. In this work, we propose a framework for multi-agent training using deep deterministic policy gradients that enables concurrent, end-to-end learning of an explicit communication protocol through a memory device. During training, the agents learn to perform read and write operations enabling them to infer a shared representation of the world. We empirically demonstrate that concurrent learning of the communication device and individual policies can improve inter-agent coordination and performance in small-scale systems. Our experimental results show that the proposed method achieves superior performance in scenarios with up to six agents. We illustrate how different communication patterns can emerge on six different tasks of increasing complexity. Furthermore, we study the effects of corrupting the communication channel, provide a visualisation of the time-varying memory content as the underlying task is being solved and validate the building blocks of the proposed memory device through ablation studies.

中文翻译:

通过记忆驱动的通信提高小规模多智能体深度强化学习的协调性

深度强化学习算法最近被用来以集中的方式训练多个交互代理,同时保持它们的执行分散。当智能体只能获得部分观察并面临需要协调和同步技能的任务时,智能体间的通信起着至关重要的作用。在这项工作中,我们提出了一个使用深度确定性策略梯度的多智能体训练框架,该框架能够通过存储设备对显式通信协议进行并发、端到端的学习。在训练期间,代理学习执行读写操作,使他们能够推断出世界的共享表示。我们凭经验证明,通信设备和个人策略的并发学习可以提高小规模系统中的代理间协调和性能。我们的实验结果表明,所提出的方法在多达六个代理的场景中实现了卓越的性能。我们说明了不同的通信模式如何在六种复杂性不断增加的不同任务中出现。此外,我们研究了破坏通信通道的影响,在解决基本任务时提供随时间变化的记忆内容的可视化,并通过消融研究验证所提出的记忆设备的构建块。我们说明了不同的通信模式如何在六种复杂性不断增加的不同任务中出现。此外,我们研究了破坏通信通道的影响,在解决基本任务时提供随时间变化的记忆内容的可视化,并通过消融研究验证所提出的记忆设备的构建块。我们说明了如何在六种不同的复杂性不断增加的任务中出现不同的通信模式。此外,我们研究了破坏通信通道的影响,在解决基本任务时提供随时间变化的记忆内容的可视化,并通过消融研究验证所提出的记忆设备的构建块。
更新日期:2020-01-23
down
wechat
bug