当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Informative Policy Representations in Multi-Agent Reinforcement Learning via Joint-Action Distributions
arXiv - CS - Machine Learning Pub Date : 2021-06-10 , DOI: arxiv-2106.05802
Yifan Yu, Haobin Jiang, Zongqing Lu

In multi-agent reinforcement learning, the inherent non-stationarity of the environment caused by other agents' actions posed significant difficulties for an agent to learn a good policy independently. One way to deal with non-stationarity is agent modeling, by which the agent takes into consideration the influence of other agents' policies. Most existing work relies on predicting other agents' actions or goals, or discriminating between their policies. However, such modeling fails to capture the similarities and differences between policies simultaneously and thus cannot provide useful information when generalizing to unseen policies. To address this, we propose a general method to learn representations of other agents' policies via the joint-action distributions sampled in interactions. The similarities and differences between policies are naturally captured by the policy distance inferred from the joint-action distributions and deliberately reflected in the learned representations. Agents conditioned on the policy representations can well generalize to unseen agents. We empirically demonstrate that our method outperforms existing work in multi-agent tasks when facing unseen agents.

中文翻译:

通过联合行动分布的多智能体强化学习中的信息策略表示

在多智能体强化学习中,由其他智能体的行为引起的环境固有的非平稳性给智能体独立学习一个好的策略带来了很大的困难。处理非平稳性的一种方法是代理建模,代理通过它考虑其他代理策略的影响。大多数现有工作依赖于预测其他代理的行为或目标,或区分他们的政策。然而,这种建模无法同时捕捉策略之间的异同,因此在推广到看不见的策略时无法提供有用的信息。为了解决这个问题,我们提出了一种通用方法,通过在交互中采样的联合动作分布来学习其他代理策略的表示。策略之间的异同自然由从联合动作分布推断出的策略距离自然捕获,并有意反映在学习的表示中。以策略表示为条件的代理可以很好地推广到看不见的代理。我们凭经验证明,当面对看不见的代理时,我们的方法在多代理任务中的表现优于现有工作。
更新日期:2021-06-11
down
wechat
bug