当前位置: X-MOL 学术Auton. Agent. Multi-Agent Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Playing Atari with few neurons
Autonomous Agents and Multi-Agent Systems ( IF 2.0 ) Pub Date : 2021-04-19 , DOI: 10.1007/s10458-021-09497-8
Giuseppe Cuccu 1 , Julian Togelius 2 , Philippe Cudré-Mauroux 1
Affiliation  

We propose a new method for learning compact state representations and policies separately but simultaneously for policy approximation in vision-based applications such as Atari games. Approaches based on deep reinforcement learning typically map pixels directly to actions to enable end-to-end training. Internally, however, the deep neural network bears the responsibility of both extracting useful information and making decisions based on it, two objectives which can be addressed independently. Separating the image processing from the action selection allows for a better understanding of either task individually, as well as potentially finding smaller policy representations which is inherently interesting. Our approach learns state representations using a compact encoder based on two novel algorithms: (i) Increasing Dictionary Vector Quantization builds a dictionary of state representations which grows in size over time, allowing our method to address new observations as they appear in an open-ended online-learning context; and (ii) Direct Residuals Sparse Coding encodes observations in function of the dictionary, aiming for highest information inclusion by disregarding reconstruction error and maximizing code sparsity. As the dictionary size increases, however, the encoder produces increasingly larger inputs for the neural network; this issue is addressed with a new variant of the Exponential Natural Evolution Strategies algorithm which adapts the dimensionality of its probability distribution along the run. We test our system on a selection of Atari games using tiny neural networks of only 6 to 18 neurons (depending on each game’s controls). These are still capable of achieving results that are not much worse, and occasionally superior, to the state-of-the-art in direct policy search which uses two orders of magnitude more neurons.



中文翻译:

用很少的神经元玩 Atari

我们提出了一种新方法,用于分别学习紧凑状态表示和策略,但同时用于基于视觉的应用程序(例如 Atari 游戏)中的策略逼近。基于深度强化学习的方法通常将像素直接映射到动作以实现端到端训练。然而,在内部,深度神经网络承担着提取有用信息和基于它做出决策的责任,这两个目标可以独立解决。将图像处理与动作选择分开可以更好地理解单独的任务,并可能找到更小的策略表示,这本身就是有趣的。我们的方法使用基于两种新颖算法的紧凑型编码器来学习状态表示:(i) 增加字典向量量化构建了一个状态表示字典,随着时间的推移,它的大小会不断增长,使我们的方法能够解决出现在开放式在线学习环境中的新观察;(ii) Direct Residuals Sparse Coding 根据字典对观测值进行编码,旨在通过忽略重构误差和最大化代码稀疏性来实现最高的信息包含。然而,随着字典大小的增加,编码器为神经网络产生越来越大的输入;这个问题是通过指数自然进化策略算法的一个新变体来解决的,该算法在运行过程中调整其概率分布的维度。我们使用只有 6 到 18 个神经元的微型神经网络(取决于每个游戏的控制)在一系列 Atari 游戏上测试我们的系统。

更新日期:2021-04-19
down
wechat
bug