当前位置: X-MOL 学术Auton. Agent. Multi-Agent Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bounds and dynamics for empirical game theoretic analysis
Autonomous Agents and Multi-Agent Systems ( IF 2.0 ) Pub Date : 2019-12-04 , DOI: 10.1007/s10458-019-09432-y
Karl Tuyls , Julien Perolat , Marc Lanctot , Edward Hughes , Richard Everett , Joel Z. Leibo , Csaba Szepesvári , Thore Graepel

This paper provides several theoretical results for empirical game theory. Specifically, we introduce bounds for empirical game theoretical analysis of complex multi-agent interactions. In doing so we provide insights in the empirical meta game showing that a Nash equilibrium of the estimated meta-game is an approximate Nash equilibrium of the true underlying meta-game. We investigate and show how many data samples are required to obtain a close enough approximation of the underlying game. Additionally, we extend the evolutionary dynamics analysis of meta-games using heuristic payoff tables (HPTs) to asymmetric games. The state-of-the-art has only considered evolutionary dynamics of symmetric HPTs in which agents have access to the same strategy sets and the payoff structure is symmetric, implying that agents are interchangeable. Finally, we carry out an empirical illustration of the generalised method in several domains, illustrating the theory and evolutionary dynamics of several versions of the AlphaGo algorithm (symmetric), the dynamics of the Colonel Blotto game played by human players on Facebook (symmetric), the dynamics of several teams of players in the capture the flag game (symmetric), and an example of a meta-game in Leduc Poker (asymmetric), generated by the policy-space response oracle multi-agent learning algorithm.

中文翻译:

经验博弈理论分析的界线和动力学

本文为经验博弈论提供了一些理论结果。具体来说,我们为复杂的多主体互动的经验博弈理论分析引入界限。这样做,我们提供了经验性元博弈的见解,表明估算的元博弈的纳什均衡是真实基础元博弈的近似纳什均衡。我们调查并显示需要多少数据样本才能获得基本博弈的足够接近的近似值。此外,我们将使用启发式收益表(HPT)的元游戏的演化动力学分析扩展到非对称游戏。现有技术仅考虑了对称HPT的演化动力学,其中代理可以访问相同的策略集,并且收益结构是对称的,这意味着代理可以互换。最后,AlphaGo算法(对称),人类玩家在Facebook上玩的上校Blotto游戏的动态(对称),夺旗游戏中几支玩家的动态(对称)以及Leduc中的元游戏示例扑克(非对称),由策略空间响应oracle多主体学习算法生成。
更新日期:2019-12-04
down
wechat
bug