当前位置:
X-MOL 学术
›
arXiv.cs.LG
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Is Q-Learning Provably Efficient? An Extended Analysis
arXiv - CS - Machine Learning Pub Date : 2020-09-22 , DOI: arxiv-2009.10396 Kushagra Rastogi and Jonathan Lee and Fabrice Harel-Canada and Aditya Joglekar
arXiv - CS - Machine Learning Pub Date : 2020-09-22 , DOI: arxiv-2009.10396 Kushagra Rastogi and Jonathan Lee and Fabrice Harel-Canada and Aditya Joglekar
This work extends the analysis of the theoretical results presented within
the paper Is Q-Learning Provably Efficient? by Jin et al. We include a survey
of related research to contextualize the need for strengthening the theoretical
guarantees related to perhaps the most important threads of model-free
reinforcement learning. We also expound upon the reasoning used in the proofs
to highlight the critical steps leading to the main result showing that
Q-learning with UCB exploration achieves a sample efficiency that matches the
optimal regret that can be achieved by any model-based approach.
中文翻译:
Q-Learning 是否有效?扩展分析
这项工作扩展了对 Q-Learning Provably Efficient? 论文中提出的理论结果的分析。通过 Jin 等人。我们对相关研究进行了调查,以将加强与无模型强化学习的最重要线程相关的理论保证的必要性联系起来。我们还阐述了证明中使用的推理,以突出导致主要结果的关键步骤,表明 Q-learning 和 UCB 探索实现了与任何基于模型的方法可以实现的最佳遗憾相匹配的样本效率。
更新日期:2020-09-23
中文翻译:
Q-Learning 是否有效?扩展分析
这项工作扩展了对 Q-Learning Provably Efficient? 论文中提出的理论结果的分析。通过 Jin 等人。我们对相关研究进行了调查,以将加强与无模型强化学习的最重要线程相关的理论保证的必要性联系起来。我们还阐述了证明中使用的推理,以突出导致主要结果的关键步骤,表明 Q-learning 和 UCB 探索实现了与任何基于模型的方法可以实现的最佳遗憾相匹配的样本效率。