当前位置: X-MOL 学术Evol. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Emergent Solutions to High-Dimensional Multi-Task Reinforcement Learning
Evolutionary Computation ( IF 6.8 ) Pub Date : 2018-09-01 , DOI: 10.1162/evco_a_00232
Stephen Kelly 1 , Malcolm I Heywood 1
Affiliation  

Algorithms that learn through environmental interaction and delayed rewards, or reinforcement learning (RL), increasingly face the challenge of scaling to dynamic, high-dimensional, and partially observable environments. Significant attention is being paid to frameworks from deep learning, which scale to high-dimensional data by decomposing the task through multilayered neural networks. While effective, the representation is complex and computationally demanding. In this work, we propose a framework based on genetic programming which adaptively complexifies policies through interaction with the task. We make a direct comparison with several deep reinforcement learning frameworks in the challenging Atari video game environment as well as more traditional reinforcement learning frameworks based on a priori engineered features. Results indicate that the proposed approach matches the quality of deep learning while being a minimum of three orders of magnitude simpler with respect to model complexity. This results in real-time operation of the champion RL agent without recourse to specialized hardware support. Moreover, the approach is capable of evolving solutions to multiple game titles simultaneously with no additional computational cost. In this case, agent behaviours for an individual game as well as single agents capable of playing all games emerge from the same evolutionary run.

中文翻译:

高维多任务强化学习的紧急解决方案

通过环境交互和延迟奖励或强化学习 (RL) 学习的算法越来越面临扩展到动态、高维和部分可观察环境的挑战。深度学习的框架受到了极大的关注,这些框架通过多层神经网络分解任务来扩展到高维数据。虽然有效,但表示复杂且计算量大。在这项工作中,我们提出了一个基于遗传编程的框架,该框架通过与任务的交互自适应地复杂化策略。我们与具有挑战性的 Atari 视频游戏环境中的几个深度强化学习框架以及基于先验工程特征的更传统的强化学习框架进行了直接比较。结果表明,所提出的方法与深度学习的质量相匹配,同时在模型复杂性方面至少要简单三个数量级。这导致冠军 RL 代理的实时操作,而无需求助于专门的硬件支持。此外,该方法能够同时为多个游戏开发解决方案,而无需额外的计算成本。在这种情况下,单个游戏的代理行为以及能够玩所有游戏的单个代理都来自同一个进化运行。该方法能够同时为多个游戏开发解决方案,而无需额外的计算成本。在这种情况下,单个游戏的代理行为以及能够玩所有游戏的单个代理都来自同一个进化运行。该方法能够同时为多个游戏开发解决方案,而无需额外的计算成本。在这种情况下,单个游戏的代理行为以及能够玩所有游戏的单个代理都来自同一个进化运行。
更新日期:2018-09-01
down
wechat
bug