Evolving Reinforcement Learning Algorithms,arXiv - CS - Neural and Evolutionary Computing

当前位置： X-MOL 学术 › arXiv.cs.NE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Evolving Reinforcement Learning Algorithms
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2021-01-08 , DOI: arxiv-2101.03958
John D. Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Sergey Levine, Quoc V. Le, Honglak Lee, Aleksandra Faust

We propose a method for meta-learning reinforcement learning algorithms by searching over the space of computational graphs which compute the loss function for a value-based model-free RL agent to optimize. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. Our method can both learn from scratch and bootstrap off known existing algorithms, like DQN, enabling interpretable modifications which improve performance. Learning from scratch on simple classical control and gridworld tasks, our method rediscovers the temporal-difference (TD) algorithm. Bootstrapped from DQN, we highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games. The analysis of the learned algorithm behavior shows resemblance to recently proposed RL algorithms that address overestimation in value-based methods.

中文翻译：

不断发展的强化学习算法

我们通过搜索计算图的空间来提出一种用于元学习强化学习算法的方法，该计算图可以计算基于值的无模型RL代理的损失函数以进行优化。学习的算法与领域无关，并且可以推广到训练期间未看到的新环境。我们的方法可以从头开始学习，也可以引导已知的现有算法（例如DQN）进行启动，从而实现可解释的修改，从而提高了性能。从零开始学习简单的经典控制和gridworld任务，我们的方法重新发现了时差（TD）算法。从DQN开始，我们重点介绍了两种学习过的算法，它们在其他经典控制任务，Gridworld类型任务和Atari游戏上均具有良好的泛化性能。

更新日期：2021-01-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>