当前位置:
X-MOL 学术
›
arXiv.cs.LG
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Minimax Sample Complexity for Turn-based Stochastic Game
arXiv - CS - Machine Learning Pub Date : 2020-11-29 , DOI: arxiv-2011.14267 Qiwen Cui, Lin F. Yang
arXiv - CS - Machine Learning Pub Date : 2020-11-29 , DOI: arxiv-2011.14267 Qiwen Cui, Lin F. Yang
The empirical success of Multi-agent reinforcement learning is encouraging,
while few theoretical guarantees have been revealed. In this work, we prove
that the plug-in solver approach, probably the most natural reinforcement
learning algorithm, achieves minimax sample complexity for turn-based
stochastic game (TBSG). Specifically, we plan in an empirical TBSG by utilizing
a `simulator' that allows sampling from arbitrary state-action pair. We show
that the empirical Nash equilibrium strategy is an approximate Nash equilibrium
strategy in the true TBSG and give both problem-dependent and
problem-independent bound. We develop absorbing TBSG and reward perturbation
techniques to tackle the complex statistical dependence. The key idea is
artificially introducing a suboptimality gap in TBSG and then the Nash
equilibrium strategy lies in a finite set.
中文翻译:
基于回合的随机游戏的Minimax样本复杂度
多主体强化学习在经验上的成功令人鼓舞,尽管很少有理论上的保证。在这项工作中,我们证明了插入式求解器方法(可能是最自然的强化学习算法)可以为基于回合的随机游戏(TBSG)实现minimax样本复杂度。具体来说,我们通过使用“仿真器”来计划经验性TBSG,该“仿真器”允许从任意状态-动作对中进行采样。我们表明,经验Nash均衡策略是真实TBSG中的近似Nash均衡策略,并给出了问题相关和问题独立的界限。我们开发了可吸收的TBSG和奖励微扰技术来解决复杂的统计依赖性。
更新日期:2020-12-01
中文翻译:
基于回合的随机游戏的Minimax样本复杂度
多主体强化学习在经验上的成功令人鼓舞,尽管很少有理论上的保证。在这项工作中,我们证明了插入式求解器方法(可能是最自然的强化学习算法)可以为基于回合的随机游戏(TBSG)实现minimax样本复杂度。具体来说,我们通过使用“仿真器”来计划经验性TBSG,该“仿真器”允许从任意状态-动作对中进行采样。我们表明,经验Nash均衡策略是真实TBSG中的近似Nash均衡策略,并给出了问题相关和问题独立的界限。我们开发了可吸收的TBSG和奖励微扰技术来解决复杂的统计依赖性。