Robust experience replay sampling for multi-agent reinforcement learning,Pattern Recognition Letters

当前位置： X-MOL 学术 › Pattern Recogn. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Robust experience replay sampling for multi-agent reinforcement learning
Pattern Recognition Letters ( IF 5.1 ) Pub Date : 2021-11-08 , DOI: 10.1016/j.patrec.2021.11.006
Isack Thomas Nicholaus ₁ , Dae-Ki Kang ₁

Affiliation

Learning from the relevant experiences leads to fast convergence if the experiences provide useful information. We present the new and simple yet efficient technique to find suitable samples of experiences to train the agents in a given state of an environment. We intended to increase the number of states visited and unique sequences that efficiently reduce the number of states the agents have to explore or exploit. Our technique implicitly introduces additional strength to the exploration-exploitation trade-off. It filters the samples of experiences that can benefit more than half the number of agents and then utilizes the experiences to extract useful information for decision making. First, we compute the similarities between the observed state and previous states in the experiences to achieve this filtering. Then, we filter the samples using the hyper-parameter, $z$ , to decide which experiences will be suitable. We found out that agents learn quickly and efficiently since sampled experiences provide useful information that speeds up convergence. In every episode, most agents learn or contribute to improve the total expected future return. We further study our approaches’ generalization ability and present different settings to show significant improvements in diverse experiment environments.

中文翻译：

用于多智能体强化学习的稳健经验回放采样

如果经验提供有用的信息，则从相关经验中学习会导致快速收敛。我们提出了一种新的、简单而有效的技术来寻找合适的经验样本，以在给定的环境状态下训练代理。我们打算增加访问的状态和唯一序列的数量，从而有效地减少代理必须探索或利用的状态数量。我们的技术隐含地为探索-利用权衡引入了额外的力量。它过滤可以使一半以上的代理受益的经验样本，然后利用这些经验提取有用的信息以进行决策。首先，我们计算经验中观察到的状态和先前状态之间的相似性以实现这种过滤。然后， $z$ ，决定哪些体验是合适的。我们发现代理可以快速有效地学习，因为采样经验提供了加速收敛的有用信息。在每一集中，大多数智能体都在学习或有助于提高总的预期未来回报。我们进一步研究了我们的方法的泛化能力，并提出了不同的设置，以在不同的实验环境中显示出显着的改进。

更新日期：2021-11-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>