Minimax Sample Complexity for Turn-based Stochastic Game,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Minimax Sample Complexity for Turn-based Stochastic Game
arXiv - CS - Machine Learning Pub Date : 2020-11-29 , DOI: arxiv-2011.14267
Qiwen Cui, Lin F. Yang

The empirical success of Multi-agent reinforcement learning is encouraging, while few theoretical guarantees have been revealed. In this work, we prove that the plug-in solver approach, probably the most natural reinforcement learning algorithm, achieves minimax sample complexity for turn-based stochastic game (TBSG). Specifically, we plan in an empirical TBSG by utilizing a `simulator' that allows sampling from arbitrary state-action pair. We show that the empirical Nash equilibrium strategy is an approximate Nash equilibrium strategy in the true TBSG and give both problem-dependent and problem-independent bound. We develop absorbing TBSG and reward perturbation techniques to tackle the complex statistical dependence. The key idea is artificially introducing a suboptimality gap in TBSG and then the Nash equilibrium strategy lies in a finite set.

中文翻译：

基于回合的随机游戏的Minimax样本复杂度

多主体强化学习在经验上的成功令人鼓舞，尽管很少有理论上的保证。在这项工作中，我们证明了插入式求解器方法（可能是最自然的强化学习算法）可以为基于回合的随机游戏（TBSG）实现minimax样本复杂度。具体来说，我们通过使用“仿真器”来计划经验性TBSG，该“仿真器”允许从任意状态-动作对中进行采样。我们表明，经验Nash均衡策略是真实TBSG中的近似Nash均衡策略，并给出了问题相关和问题独立的界限。我们开发了可吸收的TBSG和奖励微扰技术来解决复杂的统计依赖性。

更新日期：2020-12-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文