Model-based Reinforcement Learning: A Survey,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Model-based Reinforcement Learning: A Survey
arXiv - CS - Machine Learning Pub Date : 2020-06-30 , DOI: arxiv-2006.16712
Thomas M. Moerland, Joost Broekens, Catholijn M. Jonker

Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a key challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This paper presents a survey of the integration of both fields, better known as model-based reinforcement learning. Model-based RL has two main steps. First, we systematically cover approaches to dynamics model learning, including challenges like dealing with stochasticity, uncertainty, partial observability, and temporal abstraction. Second, we present a systematic categorization of planning-learning integration, including aspects like: where to start planning, what budgets to allocate to planning and real data collection, how to plan, and how to integrate planning in the learning and acting loop. After these two key sections, we also discuss the potential benefits of model-based RL, like enhanced data efficiency, targeted exploration, and improved stability. Along the survey, we also draw connections to several related RL fields, like hierarchical RL and transfer, and other research disciplines, like behavioural psychology. Altogether, the survey presents a broad conceptual overview of planning-learning combinations for MDP optimization.

中文翻译：

基于模型的强化学习：一项调查

顺序决策，通常形式化为马尔可夫决策过程 (MDP) 优化，是人工智能中的一个关键挑战。解决这个问题的两个关键方法是强化学习 (RL) 和规划。本文对这两个领域的整合进行了调查，也就是众所周知的基于模型的强化学习。基于模型的强化学习有两个主要步骤。首先，我们系统地介绍了动态模型学习的方法，包括处理随机性、不确定性、部分可观察性和时间抽象等挑战。其次，我们提出了计划-学习整合的系统分类，包括以下方面：从哪里开始计划、分配给计划和实际数据收集的预算、如何计划以及如何将计划整合到学习和行动循环中。在这两个关键部分之后，我们还讨论了基于模型的 RL 的潜在好处，例如提高数据效率、有针对性的探索和提高稳定性。在调查过程中，我们还与几个相关的 RL 领域（如分层 RL 和转移）以及其他研究学科（如行为心理学）建立了联系。总之，该调查对 MDP 优化的规划-学习组合进行了广泛的概念概述。

更新日期：2020-07-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文