当前位置:
X-MOL 学术
›
arXiv.cs.MA
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts
arXiv - CS - Multiagent Systems Pub Date : 2021-05-07 , DOI: arxiv-2105.03363 Weinan Zhang, Xihuai Wang, Jian Shen, Ming Zhou
arXiv - CS - Multiagent Systems Pub Date : 2021-05-07 , DOI: arxiv-2105.03363 Weinan Zhang, Xihuai Wang, Jian Shen, Ming Zhou
This paper investigates the model-based methods in multi-agent reinforcement
learning (MARL). We specify the dynamics sample complexity and the opponent
sample complexity in MARL, and conduct a theoretic analysis of return
discrepancy upper bound. To reduce the upper bound with the intention of low
sample complexity during the whole learning process, we propose a novel
decentralized model-based MARL method, named Adaptive Opponent-wise Rollout
Policy Optimization (AORPO). In AORPO, each agent builds its multi-agent
environment model, consisting of a dynamics model and multiple opponent models,
and trains its policy with the adaptive opponent-wise rollout. We further prove
the theoretic convergence of AORPO under reasonable assumptions. Empirical
experiments on competitive and cooperative tasks demonstrate that AORPO can
achieve improved sample efficiency with comparable asymptotic performance over
the compared MARL methods.
中文翻译:
自适应对手部署的基于模型的多主体策略优化
本文研究了多智能体强化学习(MARL)中基于模型的方法。我们在MARL中指定了动态样本复杂度和对手样本复杂度,并对收益差异上限进行了理论分析。为了在整个学习过程中降低样本复杂度的目的降低上限,我们提出了一种基于分散模型的新型MARL方法,称为自适应对等部署策略优化(AORPO)。在AORPO中,每个座席都会建立其多座席环境模型,该模型由动力学模型和多个对手模型组成,并通过自适应的逐个对手部署来训练其策略。我们在合理的假设下进一步证明了AORPO的理论收敛性。
更新日期:2021-05-10
中文翻译:
自适应对手部署的基于模型的多主体策略优化
本文研究了多智能体强化学习(MARL)中基于模型的方法。我们在MARL中指定了动态样本复杂度和对手样本复杂度,并对收益差异上限进行了理论分析。为了在整个学习过程中降低样本复杂度的目的降低上限,我们提出了一种基于分散模型的新型MARL方法,称为自适应对等部署策略优化(AORPO)。在AORPO中,每个座席都会建立其多座席环境模型,该模型由动力学模型和多个对手模型组成,并通过自适应的逐个对手部署来训练其策略。我们在合理的假设下进一步证明了AORPO的理论收敛性。