Increasing sample efficiency in deep reinforcement learning using generative environment modelling,Expert Systems

当前位置： X-MOL 学术 › Expert Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Increasing sample efficiency in deep reinforcement learning using generative environment modelling
Expert Systems ( IF 3.3 ) Pub Date : 2020-03-01 , DOI: 10.1111/exsy.12537
Per‐Arne Andersen ₁ , Morten Goodwin ₁ , Ole‐Christoffer Granmo ₁

Affiliation

Reinforcement learning is a broad scheme of learning algorithms that, in recent times, has shown astonishing performance in controlling agents in environments presented as Markov decision processes. There are several unsolved problems in current state-of-the-art that causes algorithms to learn suboptimal policies, or even diverge and collapse completely. Parts of the solution to address these issues may be related to short- and long-term planning, memory management and exploration for reinforcement learning algorithms. Games are frequently used to benchmark reinforcement learning algorithms as they provide a flexible, reproducible and easy to control environments. Regardless, few games feature the ability to perceive how the algorithm performs exploration, memorization and planning. This article presents The Dreaming Variational Autoencoder with Stochastic Weight Averaging and Generative Adversarial Networks (DVAE-SWAGAN), a neural network-based generative modelling architecture for exploration in environments with sparse feedback. We present deep maze, a novel and flexible maze game-engine that challenges DVAE-SWAGAN in partial and fully observable state-spaces, long-horizon tasks and deterministic and stochastic problems. We show results between different variants of the algorithm and encourage future study in reinforcement learning driven by generative exploration.

中文翻译：

使用生成环境建模提高深度强化学习的样本效率

强化学习是一种广泛的学习算法方案，近年来，它在以马尔可夫决策过程呈现的环境中控制代理方面表现出惊人的性能。当前最先进的技术中有几个未解决的问题会导致算法学习次优策略，甚至完全发散和崩溃。解决这些问题的部分解决方案可能与强化学习算法的短期和长期规划、记忆管理和探索有关。游戏经常用于对强化学习算法进行基准测试，因为它们提供了灵活、可重复且易于控制的环境。无论如何，很少有游戏能够感知算法如何执行探索、记忆和规划。本文介绍具有随机权重平均和生成对抗网络(DVAE-SWAGAN)的 Dreaming Variational Autoencoder，一种基于神经网络的生成建模架构，用于在具有稀疏反馈的环境中进行探索。我们提出了深度迷宫，这是一种新颖且灵活的迷宫游戏引擎，它在部分和完全可观察的状态空间、长期任务以及确定性和随机问题中挑战 DVAE-SWAGAN。我们展示了算法不同变体之间的结果，并鼓励未来研究由生成性探索驱动的强化学习。

更新日期：2020-03-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>