当前位置:
X-MOL 学术
›
arXiv.cs.MA
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Local Information Opponent Modelling Using Variational Autoencoders
arXiv - CS - Multiagent Systems Pub Date : 2020-06-16 , DOI: arxiv-2006.09447 Georgios Papoudakis, Filippos Christianos, Stefano V. Albrecht
arXiv - CS - Multiagent Systems Pub Date : 2020-06-16 , DOI: arxiv-2006.09447 Georgios Papoudakis, Filippos Christianos, Stefano V. Albrecht
Modelling the behaviours of other agents (opponents) is essential for
understanding how agents interact and making effective decisions. Existing
methods for opponent modelling commonly assume knowledge of the local
observations and chosen actions of the modelled opponents, which can
significantly limit their applicability. We propose a new modelling technique
based on variational autoencoders, which are trained to reconstruct the local
actions and observations of the opponent based on embeddings which depend only
on the local observations of the modelling agent (its observed world state,
chosen actions, and received rewards). The embeddings are used to augment the
modelling agent's decision policy which is trained via deep reinforcement
learning; thus the policy does not require access to opponent observations. We
provide a comprehensive evaluation and ablation study in diverse multi-agent
tasks, showing that our method achieves comparable performance to an ideal
baseline which has full access to opponent's information, and significantly
higher returns than a baseline method which does not use the learned
embeddings.
中文翻译:
使用变分自编码器的本地信息对手建模
对其他代理(对手)的行为进行建模对于理解代理如何交互和做出有效决策至关重要。现有的对手建模方法通常假设了解建模对手的局部观察和选择的动作,这会显着限制它们的适用性。我们提出了一种基于变分自编码器的新建模技术,该技术经过训练以基于仅依赖建模代理的局部观察(其观察到的世界状态、选择的动作和收到的奖励)的嵌入来重建对手的局部动作和观察)。嵌入用于增强通过深度强化学习训练的建模代理的决策策略;因此该策略不需要访问对手的观察。
更新日期:2020-10-07
中文翻译:
使用变分自编码器的本地信息对手建模
对其他代理(对手)的行为进行建模对于理解代理如何交互和做出有效决策至关重要。现有的对手建模方法通常假设了解建模对手的局部观察和选择的动作,这会显着限制它们的适用性。我们提出了一种基于变分自编码器的新建模技术,该技术经过训练以基于仅依赖建模代理的局部观察(其观察到的世界状态、选择的动作和收到的奖励)的嵌入来重建对手的局部动作和观察)。嵌入用于增强通过深度强化学习训练的建模代理的决策策略;因此该策略不需要访问对手的观察。