Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts,arXiv - CS - Multiagent Systems

当前位置： X-MOL 学术 › arXiv.cs.MA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts
arXiv - CS - Multiagent Systems Pub Date : 2021-05-07 , DOI: arxiv-2105.03363
Weinan Zhang, Xihuai Wang, Jian Shen, Ming Zhou

This paper investigates the model-based methods in multi-agent reinforcement learning (MARL). We specify the dynamics sample complexity and the opponent sample complexity in MARL, and conduct a theoretic analysis of return discrepancy upper bound. To reduce the upper bound with the intention of low sample complexity during the whole learning process, we propose a novel decentralized model-based MARL method, named Adaptive Opponent-wise Rollout Policy Optimization (AORPO). In AORPO, each agent builds its multi-agent environment model, consisting of a dynamics model and multiple opponent models, and trains its policy with the adaptive opponent-wise rollout. We further prove the theoretic convergence of AORPO under reasonable assumptions. Empirical experiments on competitive and cooperative tasks demonstrate that AORPO can achieve improved sample efficiency with comparable asymptotic performance over the compared MARL methods.

中文翻译：

自适应对手部署的基于模型的多主体策略优化

本文研究了多智能体强化学习（MARL）中基于模型的方法。我们在MARL中指定了动态样本复杂度和对手样本复杂度，并对收益差异上限进行了理论分析。为了在整个学习过程中降低样本复杂度的目的降低上限，我们提出了一种基于分散模型的新型MARL方法，称为自适应对等部署策略优化（AORPO）。在AORPO中，每个座席都会建立其多座席环境模型，该模型由动力学模型和多个对手模型组成，并通过自适应的逐个对手部署来训练其策略。我们在合理的假设下进一步证明了AORPO的理论收敛性。

更新日期：2021-05-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>