Offline Reinforcement Learning Hands-On,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Offline Reinforcement Learning Hands-On
arXiv - CS - Machine Learning Pub Date : 2020-11-29 , DOI: arxiv-2011.14379
Louis Monier, Jakub Kmec, Alexandre Laterre, Thomas Pierrot, Valentin Courgeau, Olivier Sigaud, Karim Beguir

Offline Reinforcement Learning (RL) aims to turn large datasets into powerful decision-making engines without any online interactions with the environment. This great promise has motivated a large amount of research that hopes to replicate the success RL has experienced in simulation settings. This work ambitions to reflect upon these efforts from a practitioner viewpoint. We start by discussing the dataset properties that we hypothesise can characterise the type of offline methods that will be the most successful. We then verify these claims through a set of experiments and designed datasets generated from environments with both discrete and continuous action spaces. We experimentally validate that diversity and high-return examples in the data are crucial to the success of offline RL and show that behavioural cloning remains a strong contender compared to its contemporaries. Overall, this work stands as a tutorial to help people build their intuition on today's offline RL methods and their applicability.

中文翻译：

离线强化学习动手

离线强化学习（RL）旨在将大型数据集转变为功能强大的决策引擎，而无需与环境进行任何在线交互。巨大的希望激发了大量研究的希望，这些研究希望复制RL在模拟环境中获得的成功。这项工作旨在从实践者的角度反思这些努力。我们首先讨论假设的数据集属性可以描述最成功的离线方法的类型。然后，我们通过一组实验以及从具有离散和连续动作空间的环境中生成的设计数据集来验证这些声明。我们通过实验验证了数据中的多样性和高回报示例对于离线RL的成功至关重要，并表明行为克隆与其同期相比仍然是一个强有力的竞争者。总的来说，这项工作是一个教程，可以帮助人们建立对当今离线RL方法及其适用性的直觉。

更新日期：2020-12-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>