Q-Learning: Theory and Applications,Annual Review of Statistics and Its Application

当前位置： X-MOL 学术 › Annu. Rev. Stat. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Q-Learning: Theory and Applications
Annual Review of Statistics and Its Application ( IF 7.4 ) Pub Date : 2020-03-09 , DOI: 10.1146/annurev-statistics-031219-041220
Jesse Clifton ₁ , Eric Laber ₁

Affiliation

Q-learning, originally an incremental algorithm for estimating an optimal decision strategy in an infinite-horizon decision problem, now refers to a general class of reinforcement learning methods widely used in statistics and artificial intelligence. In the context of personalized medicine, finite-horizon Q-learning is the workhorse for estimating optimal treatment strategies, known as treatment regimes. Infinite-horizon Q-learning is also increasingly relevant in the growing field of mobile health. In computer science, Q-learning methods have achieved remarkable performance in domains such as game-playing and robotics. In this article, we (a) review the history of Q-learning in computer science and statistics, (b) formalize finite-horizon Q-learning within the potential outcomes framework and discuss the inferential difficulties for which it is infamous, and (c) review variants of infinite-horizon Q-learning and the exploration-exploitation problem, which arises in decision problems with a long time horizon. We close by discussing issues arising with the use of Q-learning in practice, including arguments for combining Q-learning with direct-search methods; sample size considerations for sequential, multiple assignment randomized trials; and possibilities for combining Q-learning with model-based methods.

中文翻译：

Q学习：理论与应用

Q学习，最初是一种用于估计无限水平决策问题中最优决策策略的增量算法，现在指的是广泛用于统计和人工智能的一般强化学习方法。在个性化医学的背景下，有限水平的Q学习是估算最佳治疗策略（称为治疗方案）的主力军。无限水平的Q学习在移动健康领域中也越来越重要。在计算机科学中，Q学习方法在诸如游戏和机器人技术等领域均取得了卓越的性能。在本文中，我们（a）回顾了计算机科学和统计学中的Q学习历史，（b）在潜在结果框架内正式进行有限水平的Q学习，并讨论其臭名昭著的推论困难，并且（c）审查因决策问题而产生的无限水平Q学习的变体和探索开发问题很长一段时间。最后，我们讨论了在实践中使用Q学习所产生的问题，包括将Q学习与直接搜索方法相结合的论点；顺序，多次分配随机试验的样本量注意事项；以及将Q学习与基于模型的方法相结合的可能性。

更新日期：2020-03-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11