Pre-training with non-expert human demonstration for deep reinforcement learning,The Knowledge Engineering Review

当前位置： X-MOL 学术 › Knowl. Eng. Rev. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Pre-training with non-expert human demonstration for deep reinforcement learning
The Knowledge Engineering Review ( IF 2.8 ) Pub Date : 2019-07-26 , DOI: 10.1017/s0269888919000055
Gabriel V. de la Cruz , Yunshu Du , Matthew E. Taylor

Deep reinforcement learning (deep RL) has achieved superior performance in complex sequential tasks by using deep neural networks as function approximators to learn directly from raw input images. However, learning directly from raw images is data inefficient. The agent must learn feature representation of complex states in addition to learning a policy. As a result, deep RL typically suffers from slow learning speeds and often requires a prohibitively large amount of training time and data to reach reasonable performance, making it inapplicable to real-world settings where data are expensive. In this work, we improve data efficiency in deep RL by addressing one of the two learning goals, feature learning. We leverage supervised learning to pre-train on a small set of non-expert human demonstrations and empirically evaluate our approach using the asynchronous advantage actor-critic algorithms in the Atari domain. Our results show significant improvements in learning speed, even when the provided demonstration is noisy and of low quality.

中文翻译：

通过非专家人工演示进行预训练以进行深度强化学习

深度强化学习（deep RL）通过使用深度神经网络作为函数逼近器直接从原始输入图像中学习，在复杂的顺序任务中取得了卓越的性能。然而，直接从原始图像中学习数据效率低下。除了学习策略之外，代理还必须学习复杂状态的特征表示。因此，深度 RL 通常学习速度较慢，并且通常需要大量的训练时间和数据才能达到合理的性能，这使其不适用于数据昂贵的现实环境。在这项工作中，我们通过解决两个学习目标之一，即特征学习来提高深度 RL 中的数据效率。我们利用监督学习对一小部分非专家人类演示进行预训练，并使用 Atari 领域中的异步优势演员批评算法对我们的方法进行经验评估。我们的结果显示学习速度显着提高，即使提供的演示嘈杂且质量低下也是如此。

更新日期：2019-07-26

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文