当前位置: X-MOL 学术Annu. Rev. Stat. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Q-Learning: Theory and Applications
Annual Review of Statistics and Its Application ( IF 7.4 ) Pub Date : 2020-03-09 , DOI: 10.1146/annurev-statistics-031219-041220
Jesse Clifton 1 , Eric Laber 1
Affiliation  

Q-learning, originally an incremental algorithm for estimating an optimal decision strategy in an infinite-horizon decision problem, now refers to a general class of reinforcement learning methods widely used in statistics and artificial intelligence. In the context of personalized medicine, finite-horizon Q-learning is the workhorse for estimating optimal treatment strategies, known as treatment regimes. Infinite-horizon Q-learning is also increasingly relevant in the growing field of mobile health. In computer science, Q-learning methods have achieved remarkable performance in domains such as game-playing and robotics. In this article, we (a) review the history of Q-learning in computer science and statistics, (b) formalize finite-horizon Q-learning within the potential outcomes framework and discuss the inferential difficulties for which it is infamous, and (c) review variants of infinite-horizon Q-learning and the exploration-exploitation problem, which arises in decision problems with a long time horizon. We close by discussing issues arising with the use of Q-learning in practice, including arguments for combining Q-learning with direct-search methods; sample size considerations for sequential, multiple assignment randomized trials; and possibilities for combining Q-learning with model-based methods.

中文翻译:


Q学习:理论与应用

Q学习,最初是一种用于估计无限水平决策问题中最优决策策略的增量算法,现在指的是广泛用于统计和人工智能的一般强化学习方法。在个性化医学的背景下,有限水平的Q学习是估算最佳治疗策略(称为治疗方案)的主力军。无限水平的Q学习在移动健康领域中也越来越重要。在计算机科学中,Q学习方法在诸如游戏和机器人技术等领域均取得了卓越的性能。在本文中,我们(a)回顾了计算机科学和统计学中的Q学习历史,(b)在潜在结果框架内正式进行有限水平的Q学习,并讨论其臭名昭著的推论困难,并且(c)审查因决策问题而产生的无限水平Q学习的变体和探索开发问题很长一段时间。最后,我们讨论了在实践中使用Q学习所产生的问题,包括将Q学习与直接搜索方法相结合的论点;顺序,多次分配随机试验的样本量注意事项;以及将Q学习与基于模型的方法相结合的可能性。

更新日期:2020-03-09
down
wechat
bug