当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Minimax Sample Complexity for Turn-based Stochastic Game
arXiv - CS - Machine Learning Pub Date : 2020-11-29 , DOI: arxiv-2011.14267
Qiwen Cui, Lin F. Yang

The empirical success of Multi-agent reinforcement learning is encouraging, while few theoretical guarantees have been revealed. In this work, we prove that the plug-in solver approach, probably the most natural reinforcement learning algorithm, achieves minimax sample complexity for turn-based stochastic game (TBSG). Specifically, we plan in an empirical TBSG by utilizing a `simulator' that allows sampling from arbitrary state-action pair. We show that the empirical Nash equilibrium strategy is an approximate Nash equilibrium strategy in the true TBSG and give both problem-dependent and problem-independent bound. We develop absorbing TBSG and reward perturbation techniques to tackle the complex statistical dependence. The key idea is artificially introducing a suboptimality gap in TBSG and then the Nash equilibrium strategy lies in a finite set.

中文翻译:

基于回合的随机游戏的Minimax样本复杂度

多主体强化学习在经验上的成功令人鼓舞,尽管很少有理论上的保证。在这项工作中,我们证明了插入式求解器方法(可能是最自然的强化学习算法)可以为基于回合的随机游戏(TBSG)实现minimax样本复杂度。具体来说,我们通过使用“仿真器”来计划经验性TBSG,该“仿真器”允许从任意状态-动作对中进行采样。我们表明,经验Nash均衡策略是真实TBSG中的近似Nash均衡策略,并给出了问题相关和问题独立的界限。我们开发了可吸收的TBSG和奖励微扰技术来解决复杂的统计依赖性。
更新日期:2020-12-01
down
wechat
bug