On the convergence of reinforcement learning with Monte Carlo Exploring Starts,Automatica

当前位置： X-MOL 学术 › Automatica › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On the convergence of reinforcement learning with Monte Carlo Exploring Starts
Automatica ( IF 4.8 ) Pub Date : 2021-05-07 , DOI: 10.1016/j.automatica.2021.109693
Jun Liu

A basic simulation-based reinforcement learning algorithm is the Monte Carlo Exploring Starts (MCES) method, also known as optimistic policy iteration, in which the value function is approximated by simulated returns and a greedy policy is selected at each iteration. The convergence of this algorithm in the general setting has been an open question. In this paper, we investigate the convergence of this algorithm for the case with undiscounted costs, also known as the stochastic shortest path problem. The results complement existing partial results on this topic and thereby help further settle the open problem.

中文翻译：

关于强化学习与蒙特卡洛探索起点的融合

基本的基于模拟的强化学习算法是蒙特卡洛探索开始（MCES）方法，也称为乐观策略迭代，其中，值函数通过模拟收益近似，并且每次迭代都选择一个贪婪策略。在一般情况下该算法的收敛性是一个悬而未决的问题。在本文中，我们针对成本未折扣的情况（也称为随机最短路径问题）研究了该算法的收敛性。结果补充了有关该主题的现有部分结果，从而有助于进一步解决悬而未决的问题。

更新日期：2021-05-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11