当前位置:
X-MOL 学术
›
arXiv.cs.GT
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-09-01 , DOI: arxiv-2009.00162 Qifan Zhang, Yue Guan, Panagiotis Tsiotras
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-09-01 , DOI: arxiv-2009.00162 Qifan Zhang, Yue Guan, Panagiotis Tsiotras
We explore the use of policy approximation for reducing the computational
cost of learning Nash equilibria in multi-agent reinforcement learning
scenarios. We propose a new algorithm for zero-sum stochastic games in which
each agent simultaneously learns a Nash policy and an entropy-regularized
policy. The two policies help each other towards convergence: the former guides
the latter to the desired Nash equilibrium, while the latter serves as an
efficient approximation of the former. We demonstrate the possibility of using
the proposed algorithm to transfer previous training experiences to different
environments, enabling the agents to adapt quickly to new tasks. We also
provide a dynamic hyper-parameter scheduling scheme for further expedited
convergence. Empirical results applied to a number of stochastic games show
that the proposed algorithm converges to the Nash equilibrium while exhibiting
a major speed-up over existing algorithms.
中文翻译:
通过熵正则化策略逼近在零和随机博弈中学习纳什均衡
我们探索使用策略近似来降低多智能体强化学习场景中学习纳什均衡的计算成本。我们为零和随机游戏提出了一种新算法,其中每个代理同时学习纳什策略和熵正则化策略。这两种策略在趋同方面相互帮助:前者引导后者达到所需的纳什均衡,而后者则作为前者的有效近似。我们证明了使用所提出的算法将先前的训练经验转移到不同环境的可能性,使代理能够快速适应新任务。我们还提供了动态超参数调度方案,以进一步加快收敛速度。
更新日期:2020-09-02
中文翻译:
通过熵正则化策略逼近在零和随机博弈中学习纳什均衡
我们探索使用策略近似来降低多智能体强化学习场景中学习纳什均衡的计算成本。我们为零和随机游戏提出了一种新算法,其中每个代理同时学习纳什策略和熵正则化策略。这两种策略在趋同方面相互帮助:前者引导后者达到所需的纳什均衡,而后者则作为前者的有效近似。我们证明了使用所提出的算法将先前的训练经验转移到不同环境的可能性,使代理能够快速适应新任务。我们还提供了动态超参数调度方案,以进一步加快收敛速度。