当前位置: X-MOL 学术IEEE ASME Trans. Mechatron. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning Communication for Cooperation in Dynamic Agent-Number Environment
IEEE/ASME Transactions on Mechatronics ( IF 6.4 ) Pub Date : 2021-04-27 , DOI: 10.1109/tmech.2021.3076080
Weiwei Liu , Shanqi Liu , Junjie Cao , Qi Wang , Xiaolei Lang , Yong Liu

The number of agents in many multiagent systems in the real world, such as storage robots and drone cluster systems, continually changes. Still, most current multiagent reinforcement learning (RL) algorithms are limited to fixed network dimensions, and prior knowledge is used to preset the number of agents in the training phase, which leads to a poor generalization of the algorithm. In addition, these algorithms use centralized training to solve the instability problem of multiagent systems. However, the centralized learning of large-scale multiagent RL algorithms will lead to an explosion of network dimensions, which in turn leads to very limited scalability of centralized learning algorithms. To solve these two difficulties, in this article propose a group centralized training and decentralized execution-unlimited dynamic agent-number network (GCTDE-UDAN). First, since we use the attention mechanism to select several leaders and establish a dynamic number of teams, and the UDAN performs a nonlinear combination of all agents’ Q values when performing value decomposition, it is not affected by changes in the number of agents. Moreover, our algorithm can unite any agent to form a group and conduct centralized training within the group, avoiding network dimension explosion caused by the global centralized training of large-scale agents. Finally, we verified on the simulation and experimental platform that the algorithm can learn and perform cooperative behaviors in many dynamic multiagent environments.

中文翻译:

在动态代理数环境中学习合作交流

现实世界中许多多智能体系统中的智能体数量,例如存储机器人和无人机集群系统,在不断变化。尽管如此,目前大多数多智能体强化学习 (RL) 算法仅限于固定网络维度,并且在训练阶段使用先验知识来预设智能体的数量,这导致算法的泛化能力较差。此外,这些算法使用集中训练来解决多智能体系统的不稳定问题。然而,大规模多智能体强化学习算法的集中学习会导致网络维度的爆炸,进而导致集中学习算法的可扩展性非常有限。为了解决这两难,在本文中提出了一个群体集中训练分散执行-无限动态代理数网络(GCTDE-UDAN)。首先,由于我们使用注意力机制来选择几个领导者并建立一个动态数量的团队,而 UDAN 执行所有代理的非线性组合Q值在进行值分解时,不受代理数量变化的影响。而且,我们的算法可以将任意一个agent联合起来组成一个组,在组内进行集中训练,避免大规模agent全局集中训练造成的网络维度爆炸。最后,我们在仿真和实验平台上验证了该算法可以在许多动态多智能体环境中学习和执行协作行为。
更新日期:2021-04-27
down
wechat
bug