SA-MATD3:Self-attention-based multi-agent continuous control method in cooperative environments,arXiv - CS - Multiagent Systems

当前位置： X-MOL 学术 › arXiv.cs.MA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SA-MATD3:Self-attention-based multi-agent continuous control method in cooperative environments
arXiv - CS - Multiagent Systems Pub Date : 2021-07-01 , DOI: arxiv-2107.00284
Kai Liu, Yuyang Zhao, Gang Wang, Bei Peng

Cooperative problems under continuous control have always been the focus of multi-agent reinforcement learning. Existing algorithms suffer from the problem of uneven learning degree with the increase of the number of agents. In this paper, a new structure for a multi-agent actor critic is proposed, and the self-attention mechanism is applied in the critic network and the value decomposition method used to solve the uneven problem. The proposed algorithm makes full use of the samples in the replay memory buffer to learn the behavior of a class of agents. First, a new update method is proposed for policy networks that promotes learning efficiency. Second, the utilization of samples is improved, at the same time reflecting the ability of perspective-taking among groups. Finally, the "deceptive signal" in training is eliminated and the learning degree among agents is more uniform than in the existing methods. Multiple experiments were conducted in two typical scenarios of a multi-agent particle environment. Experimental results show that the proposed algorithm can perform better than the state-of-the-art ones, and that it exhibits higher learning efficiency with an increasing number of agents.

中文翻译：

SA-MATD3：协同环境下基于Self-attention的多智能体连续控制方法

持续控制下的合作问题一直是多智能体强化学习的重点。随着智能体数量的增加，现有算法存在学习程度不均的问题。本文提出了一种新的多智能体actorcritic结构，将self-attention机制应用到critic网络中，利用价值分解方法解决不均衡问题。所提出的算法充分利用重放内存缓冲区中的样本来学习一类代理的行为。首先，为提高学习效率的策略网络提出了一种新的更新方法。二是提高了样本的利用率，同时体现了群体间的观点采择能力。最后，“欺骗性信号” 在训练中消除了，并且代理之间的学习程度比现有方法更加统一。在多智能体粒子环境的两个典型场景中进行了多次实验。实验结果表明，该算法的性能优于最先进的算法，并且随着智能体数量的增加，它表现出更高的学习效率。

更新日期：2021-07-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>