Towards solving 2-TBSG efficiently,Optimization Methods & Software

当前位置： X-MOL 学术 › Optim. Methods Softw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards solving 2-TBSG efficiently
Optimization Methods & Software ( IF 1.4 ) Pub Date : 2019-12-10 , DOI: 10.1080/10556788.2019.1695131
Zeyu Jia ₁ , Zaiwen Wen ₂ , Yinyu Ye ₃

Affiliation

Two-player turn-based stochastic game (2-TBSG) is a two-player game model which aims to find Nash equilibriums and is widely utilized in reinforcement learning and AI. Inspired by the fact that the simplex method for solving the deterministic discounted Markov decision processes is strongly polynomial independent of the discount factor, we are trying to answer an open problem whether there is a similar algorithm for 2-TBSG. We develop a simplex strategy iteration where one player updates its strategy with a simplex step while the other player finds an optimal counterstrategy in turn, and a modified simplex strategy iteration. Both of them belong to a class of geometrically converging algorithms. We establish the strongly polynomial property of these algorithms by considering a strategy combined from the current strategy and the equilibrium strategy. Moreover, we present a method to transform general 2-TBSGs into special 2-TBSGs where each state has exactly two actions.

中文翻译：

致力于有效解决2-TBSG

两人回合制随机游戏（2-TBSG）是一种两人游戏模型，旨在寻找纳什均衡，并广泛用于强化学习和AI。受以下事实启发：解决确定性折扣Markov决策过程的单纯形方法是与多项式因子无关的强多项式，我们试图回答一个开放性问题，即是否存在类似的2-TBSG算法。我们开发了一个单纯形策略迭代，其中一个参与者以一个单纯形步骤更新其策略，而另一个参与者依次找到最佳的反策略，并修改了单纯形策略迭代。它们都属于一类几何收敛算法。通过考虑将当前策略和均衡策略相结合的策略，我们建立了这些算法的强多项式性质。此外，我们提出了一种将一般的2-TBSG转换成特殊的2-TBSG的方法，其中每个状态恰好有两个动作。

更新日期：2019-12-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11