当前位置: X-MOL 学术arXiv.cs.MA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Energy-based Surprise Minimization for Multi-Agent Value Factorization
arXiv - CS - Multiagent Systems Pub Date : 2020-09-16 , DOI: arxiv-2009.09842
Karush Suri, Xiao Qi Shi, Konstantinos Plataniotis, Yuri Lawryshyn

Multi-Agent Reinforcement Learning (MARL) has demonstrated significant success in training decentralised policies in a centralised manner by making use of value factorization methods. However, addressing surprise across spurious states and approximation bias remain open problems for multi-agent settings. We introduce the Energy-based MIXer (EMIX), an algorithm which minimizes surprise utilizing the energy across agents. Our contributions are threefold; (1) EMIX introduces a novel surprise minimization technique across multiple agents in the case of multi-agent partially-observable settings. (2) EMIX highlights the first practical use of energy functions in MARL (to our knowledge) with theoretical guarantees and experiment validations of the energy operator. Lastly, (3) EMIX presents a novel technique for addressing overestimation bias across agents in MARL. When evaluated on a range of challenging StarCraft II micromanagement scenarios, EMIX demonstrates consistent state-of-the-art performance for multi-agent surprise minimization. Moreover, our ablation study highlights the necessity of the energy-based scheme and the need for elimination of overestimation bias in MARL. Our implementation of EMIX and videos of agents are available at https://karush17.github.io/emix-web/.

中文翻译:

多智能体价值分解的基于能量的惊喜最小化

多智能体强化学习 (MARL) 在通过使用价值分解方法以集中方式训练分散策略方面取得了重大成功。然而,解决虚假状态的意外和近似偏差仍然是多智能体设置的开放问题。我们介绍了基于能量的混合器 (EMIX),这是一种利用代理之间的能量最大限度地减少意外的算法。我们的贡献是三方面的;(1) 在多代理部分可观察设置的情况下,EMIX 引入了一种跨多个代理的新奇最小化技术。(2) EMIX 强调了能量函数在 MARL 中的第一次实际使用(据我们所知),并具有能量算子的理论保证和实验验证。最后,(3) EMIX 提出了一种解决 MARL 中代理之间高估偏差的新技术。当在一系列具有挑战性的星际争霸 II 微观管理场景中进行评估时,EMIX 展示了一致的最先进的多智能体意外最小化性能。此外,我们的消融研究强调了基于能量的方案的必要性以及消除 MARL 中高估偏差的必要性。我们的 EMIX 实现和代理视频可在 https://karush17.github.io/emix-web/ 获得。
更新日期:2020-10-06
down
wechat
bug