S $$^{2}$$ 2 ES: a stationary and scalable knowledge transfer approach for multiagent reinforcement learning,Complex & Intelligent Systems

当前位置： X-MOL 学术 › Complex Intell. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

S $$^{2}$$ 2 ES: a stationary and scalable knowledge transfer approach for multiagent reinforcement learning
Complex & Intelligent Systems ( IF 5.8 ) Pub Date : 2021-07-13 , DOI: 10.1007/s40747-021-00423-9
Tonghao Wang ₁ , Xingguang Peng ₁ , Demin Xu ₁

Affiliation

Knowledge transfer is widely adopted in accelerating multiagent reinforcement learning (MARL). To accelerate the learning speed of MARL for learning-from scratch agents, in this paper, we propose a Stationary and Scalable knowledge transfer approach based on Experience Sharing (S$^{2}$ES). The mainframe of our approach is structured into three components: what kind of experience, how to learn, and when to transfer. Specifically, we first design an augmented form of experience. By sharing (i.e., transmitting) the experience from one agent to its peers, the learning speed can be effectively enhanced with guaranteed scalability. A synchronized learning pattern is then adopted, which reduces the nonstationarity brought by experience replay, and at the same time retains data efficiency. Moreover, to avoid redundant transfer when the agents’ policies have converged, we further design two trigger conditions, one is modified Q value-based and another is normalized Shannon entropy-based, to determine when to conduct experience sharing. Empirical studies indicate that the proposed approach outperforms the other knowledge transfer methods in efficacy, efficiency, and scalability. We also provide ablation experiments to demonstrate the necessity of the key ingredients.

中文翻译：

S $$^{2}$$ 2 ES：一种用于多智能体强化学习的固定且可扩展的知识转移方法

知识转移被广泛用于加速多智能体强化学习 (MARL)。为了加快从零开始学习的 MARL 的学习速度，在本文中，我们提出了一种基于经验共享（S $^{2}$ES）。我们方法的主机由三个组成部分构成：什么样的经验、如何学习以及何时转移。具体来说，我们首先设计了一种增强的体验形式。通过将经验从一个代理共享（即，传输）到其对等方，可以在保证可扩展性的情况下有效提高学习速度。然后采用同步学习模式，减少了经验回放带来的非平稳性，同时保持了数据效率。此外，为了避免代理策略收敛时的冗余转移，我们进一步设计了两个触发条件，一个是修改Qvalue-based 另一个是 normalized Shannon entropy-based，来决定什么时候进行经验分享。实证研究表明，所提出的方法在功效、效率和可扩展性方面优于其他知识转移方法。我们还提供消融实验来证明关键成分的必要性。

更新日期：2021-07-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>