当前位置:
X-MOL 学术
›
arXiv.cs.LG
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games
arXiv - CS - Machine Learning Pub Date : 2020-06-30 , DOI: arxiv-2006.16679 Zhongxiang Dai, Yizhou Chen, Kian Hsiang Low, Patrick Jaillet, Teck-Hua Ho
arXiv - CS - Machine Learning Pub Date : 2020-06-30 , DOI: arxiv-2006.16679 Zhongxiang Dai, Yizhou Chen, Kian Hsiang Low, Patrick Jaillet, Teck-Hua Ho
This paper presents a recursive reasoning formalism of Bayesian optimization
(BO) to model the reasoning process in the interactions between boundedly
rational, self-interested agents with unknown, complex, and costly-to-evaluate
payoff functions in repeated games, which we call Recursive Reasoning-Based BO
(R2-B2). Our R2-B2 algorithm is general in that it does not constrain the
relationship among the payoff functions of different agents and can thus be
applied to various types of games such as constant-sum, general-sum, and
common-payoff games. We prove that by reasoning at level 2 or more and at one
level higher than the other agents, our R2-B2 agent can achieve faster
asymptotic convergence to no regret than that without utilizing recursive
reasoning. We also propose a computationally cheaper variant of R2-B2 called
R2-B2-Lite at the expense of a weaker convergence guarantee. The performance
and generality of our R2-B2 algorithm are empirically demonstrated using
synthetic games, adversarial machine learning, and multi-agent reinforcement
learning.
中文翻译:
R2-B2:基于递归推理的贝叶斯优化游戏无后悔学习
本文提出了贝叶斯优化 (BO) 的递归推理形式,以模拟有限理性、自利代理与重复博弈中未知、复杂且成本高昂的回报函数之间相互作用的推理过程,我们称之为递归基于推理的 BO (R2-B2)。我们的 R2-B2 算法是通用的,因为它不限制不同代理的收益函数之间的关系,因此可以应用于各种类型的博弈,例如常数和、一般和和共同收益博弈。我们证明,通过在第 2 级或更高级别以及比其他代理高一个级别的推理,我们的 R2-B2 代理可以实现更快的渐近收敛,比不使用递归推理的情况下没有遗憾。我们还提出了一种 R2-B2 的计算成本较低的变体,称为 R2-B2-Lite,但代价是收敛保证较弱。我们的 R2-B2 算法的性能和通用性通过使用合成游戏、对抗性机器学习和多智能体强化学习的经验证明。
更新日期:2020-07-01
中文翻译:
R2-B2:基于递归推理的贝叶斯优化游戏无后悔学习
本文提出了贝叶斯优化 (BO) 的递归推理形式,以模拟有限理性、自利代理与重复博弈中未知、复杂且成本高昂的回报函数之间相互作用的推理过程,我们称之为递归基于推理的 BO (R2-B2)。我们的 R2-B2 算法是通用的,因为它不限制不同代理的收益函数之间的关系,因此可以应用于各种类型的博弈,例如常数和、一般和和共同收益博弈。我们证明,通过在第 2 级或更高级别以及比其他代理高一个级别的推理,我们的 R2-B2 代理可以实现更快的渐近收敛,比不使用递归推理的情况下没有遗憾。我们还提出了一种 R2-B2 的计算成本较低的变体,称为 R2-B2-Lite,但代价是收敛保证较弱。我们的 R2-B2 算法的性能和通用性通过使用合成游戏、对抗性机器学习和多智能体强化学习的经验证明。