当前位置: X-MOL 学术arXiv.cs.MA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Strategically Efficient Exploration in Competitive Multi-agent Reinforcement Learning
arXiv - CS - Multiagent Systems Pub Date : 2021-07-30 , DOI: arxiv-2107.14698
Robert Loftin, Aadirupa Saha, Sam Devlin, Katja Hofmann

High sample complexity remains a barrier to the application of reinforcement learning (RL), particularly in multi-agent systems. A large body of work has demonstrated that exploration mechanisms based on the principle of optimism under uncertainty can significantly improve the sample efficiency of RL in single agent tasks. This work seeks to understand the role of optimistic exploration in non-cooperative multi-agent settings. We will show that, in zero-sum games, optimistic exploration can cause the learner to waste time sampling parts of the state space that are irrelevant to strategic play, as they can only be reached through cooperation between both players. To address this issue, we introduce a formal notion of strategically efficient exploration in Markov games, and use this to develop two strategically efficient learning algorithms for finite Markov games. We demonstrate that these methods can be significantly more sample efficient than their optimistic counterparts.

中文翻译:

竞争性多智能体强化学习中的战略高效探索

高样本复杂性仍然是强化学习 (RL) 应用的障碍,尤其是在多代理系统中。大量工作表明,基于不确定性乐观原则的探索机制可以显着提高 RL 在单智能体任务中的样本效率。这项工作旨在了解乐观探索在非合作多智能体环境中的作用。我们将证明,在零和博弈中,乐观探索会导致学习者浪费时间对与战略博弈无关的状态空间部分进行采样,因为它们只能通过双方合作才能达到。为了解决这个问题,我们引入了马尔可夫博弈中战略有效探索的正式概念,并使用它为有限马尔可夫博弈开发两种具有战略意义的高效学习算法。我们证明了这些方法可以比乐观的方法具有更高的样本效率。
更新日期:2021-08-02
down
wechat
bug