当前位置: X-MOL 学术arXiv.cs.AI › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Modelling Bounded Rationality in Multi-Agent Interactions by Generalized Recursive Reasoning
arXiv - CS - Artificial Intelligence Pub Date : 2019-01-26 , DOI: arxiv-1901.09216
Ying Wen, Yaodong Yang, Rui Luo, Jun Wang

Though limited in real-world decision making, most multi-agent reinforcement learning (MARL) models assume perfectly rational agents -- a property hardly met due to individual's cognitive limitation and/or the tractability of the decision problem. In this paper, we introduce generalized recursive reasoning (GR2) as a novel framework to model agents with different \emph{hierarchical} levels of rationality; our framework enables agents to exhibit varying levels of "thinking" ability thereby allowing higher-level agents to best respond to various less sophisticated learners. We contribute both theoretically and empirically. On the theory side, we devise the hierarchical framework of GR2 through probabilistic graphical models and prove the existence of a perfect Bayesian equilibrium. Within the GR2, we propose a practical actor-critic solver, and demonstrate its convergent property to a stationary point in two-player games through Lyapunov analysis. On the empirical side, we validate our findings on a variety of MARL benchmarks. Precisely, we first illustrate the hierarchical thinking process on the Keynes Beauty Contest, and then demonstrate significant improvements compared to state-of-the-art opponent modeling baselines on the normal-form games and the cooperative navigation benchmark.

中文翻译:

通过广义递归推理对多智能体交互中的有限理性建模

尽管在现实世界的决策中受到限制,但大多数多智能体强化学习 (MARL) 模型都假设了完全理性的智能体——由于个人的认知限制和/或决策问题的易处理性,这种特性很难满足。在本文中,我们介绍了广义递归推理(GR2)作为一种新颖的框架,以对具有不同\ emph {hierarchical} 理性水平的代理进行建模;我们的框架使代理能够表现出不同级别的“思考”能力,从而允许更高级别的代理对各种不太复杂的学习者做出最好的反应。我们在理论和经验上都有贡献。在理论方面,我们通过概率图模型设计了 GR2 的层次框架,并证明了完美贝叶斯均衡的存在。在 GR2 内,我们提出了一个实用的 actor-critic 求解器,并通过 Lyapunov 分析证明了它在两人游戏中对一个静止点的收敛性。在实证方面,我们在各种 MARL 基准上验证了我们的发现。准确地说,我们首先说明了凯恩斯选美大赛的分层思维过程,然后展示了与范式游戏和合作导航基准上最先进的对手建模基线相比的显着改进。
更新日期:2020-01-22
down
wechat
bug