R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games
arXiv - CS - Machine Learning Pub Date : 2020-06-30 , DOI: arxiv-2006.16679
Zhongxiang Dai, Yizhou Chen, Kian Hsiang Low, Patrick Jaillet, Teck-Hua Ho

This paper presents a recursive reasoning formalism of Bayesian optimization (BO) to model the reasoning process in the interactions between boundedly rational, self-interested agents with unknown, complex, and costly-to-evaluate payoff functions in repeated games, which we call Recursive Reasoning-Based BO (R2-B2). Our R2-B2 algorithm is general in that it does not constrain the relationship among the payoff functions of different agents and can thus be applied to various types of games such as constant-sum, general-sum, and common-payoff games. We prove that by reasoning at level 2 or more and at one level higher than the other agents, our R2-B2 agent can achieve faster asymptotic convergence to no regret than that without utilizing recursive reasoning. We also propose a computationally cheaper variant of R2-B2 called R2-B2-Lite at the expense of a weaker convergence guarantee. The performance and generality of our R2-B2 algorithm are empirically demonstrated using synthetic games, adversarial machine learning, and multi-agent reinforcement learning.

中文翻译：

R2-B2：基于递归推理的贝叶斯优化游戏无后悔学习

本文提出了贝叶斯优化 (BO) 的递归推理形式，以模拟有限理性、自利代理与重复博弈中未知、复杂且成本高昂的回报函数之间相互作用的推理过程，我们称之为递归基于推理的 BO (R2-B2)。我们的 R2-B2 算法是通用的，因为它不限制不同代理的收益函数之间的关系，因此可以应用于各种类型的博弈，例如常数和、一般和和共同收益博弈。我们证明，通过在第 2 级或更高级别以及比其他代理高一个级别的推理，我们的 R2-B2 代理可以实现更快的渐近收敛，比不使用递归推理的情况下没有遗憾。我们还提出了一种 R2-B2 的计算成本较低的变体，称为 R2-B2-Lite，但代价是收敛保证较弱。我们的 R2-B2 算法的性能和通用性通过使用合成游戏、对抗性机器学习和多智能体强化学习的经验证明。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文