当前位置: X-MOL 学术arXiv.cs.MA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multiagent Evaluation under Incomplete Information
arXiv - CS - Multiagent Systems Pub Date : 2019-09-21 , DOI: arxiv-1909.09849
Mark Rowland, Shayegan Omidshafiei, Karl Tuyls, Julien Perolat, Michal Valko, Georgios Piliouras, Remi Munos

This paper investigates the evaluation of learned multiagent strategies in the incomplete information setting, which plays a critical role in ranking and training of agents. Traditionally, researchers have relied on Elo ratings for this purpose, with recent works also using methods based on Nash equilibria. Unfortunately, Elo is unable to handle intransitive agent interactions, and other techniques are restricted to zero-sum, two-player settings or are limited by the fact that the Nash equilibrium is intractable to compute. Recently, a ranking method called {\alpha}-Rank, relying on a new graph-based game-theoretic solution concept, was shown to tractably apply to general games. However, evaluations based on Elo or {\alpha}-Rank typically assume noise-free game outcomes, despite the data often being collected from noisy simulations, making this assumption unrealistic in practice. This paper investigates multiagent evaluation in the incomplete information regime, involving general-sum many-player games with noisy outcomes. We derive sample complexity guarantees required to confidently rank agents in this setting. We propose adaptive algorithms for accurate ranking, provide correctness and sample complexity guarantees, then introduce a means of connecting uncertainties in noisy match outcomes to uncertainties in rankings. We evaluate the performance of these approaches in several domains, including Bernoulli games, a soccer meta-game, and Kuhn poker.

中文翻译:

不完全信息下的多智能体评估

本文研究了在不完整信息设置中对学习的多智能体策略的评估,这在智能体的排名和训练中起着至关重要的作用。传统上,研究人员为此目的依赖 Elo 评级,最近的工作也使用基于纳什均衡的方法。不幸的是,Elo 无法处理不可传递的代理交互,并且其他技术仅限于零和、两人设置或受限于纳什均衡难以计算的事实。最近,一种名为 {\alpha}-Rank 的排名方法依赖于一种新的基于图的博弈论解决方案概念,被证明可以轻松地应用于一般游戏。然而,基于 Elo 或 {\alpha}-Rank 的评估通常假设无噪声的游戏结果,尽管数据通常是从噪声模拟中收集的,使这个假设在实践中不切实际。本文研究了不完全信息机制中的多智能体评估,涉及具有嘈杂结果的一般总和多人游戏。我们获得了在此设置中自信地对代理进行排名所需的样本复杂性保证。我们提出了用于准确排名的自适应算法,提供正确性和样本复杂性保证,然后引入一种将嘈杂匹配结果中的不确定性与排名中的不确定性联系起来的方法。我们在多个领域评估了这些方法的性能,包括伯努利游戏、足球元游戏和库恩扑克。我们获得了在此设置中自信地对代理进行排名所需的样本复杂性保证。我们提出了用于准确排名的自适应算法,提供正确性和样本复杂性保证,然后引入一种将嘈杂匹配结果中的不确定性与排名中的不确定性联系起来的方法。我们在多个领域评估了这些方法的性能,包括伯努利游戏、足球元游戏和库恩扑克。我们获得了在此设置中自信地对代理进行排名所需的样本复杂性保证。我们提出了用于准确排名的自适应算法,提供正确性和样本复杂性保证,然后引入一种将嘈杂匹配结果中的不确定性与排名中的不确定性联系起来的方法。我们在多个领域评估了这些方法的性能,包括伯努利游戏、足球元游戏和库恩扑克。
更新日期:2020-01-13
down
wechat
bug