Program Synthesis Guided Reinforcement Learning,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Program Synthesis Guided Reinforcement Learning
arXiv - CS - Artificial Intelligence Pub Date : 2021-02-22 , DOI: arxiv-2102.11137
Yichen Yang, Jeevana Priya Inala, Osbert Bastani, Yewen Pu, Armando Solar-Lezama, Martin Rinard

A key challenge for reinforcement learning is solving long-horizon planning and control problems. Recent work has proposed leveraging programs to help guide the learning algorithm in these settings. However, these approaches impose a high manual burden on the user since they must provide a guiding program for every new task they seek to achieve. We propose an approach that leverages program synthesis to automatically generate the guiding program. A key challenge is how to handle partially observable environments. We propose model predictive program synthesis, which trains a generative model to predict the unobserved portions of the world, and then synthesizes a program based on samples from this model in a way that is robust to its uncertainty. We evaluate our approach on a set of challenging benchmarks, including a 2D Minecraft-inspired ``craft'' environment where the agent must perform a complex sequence of subtasks to achieve its goal, a box-world environment that requires abstract reasoning, and a variant of the craft environment where the agent is a MuJoCo Ant. Our approach significantly outperforms several baselines, and performs essentially as well as an oracle that is given an effective program.

中文翻译：

程序综合指导强化学习

强化学习的关键挑战是解决长期的计划和控制问题。最近的工作提出了利用程序来帮助指导这些设置中的学习算法。但是，这些方法给用户带来了很大的手动负担，因为它们必须为他们寻求实现的每个新任务提供指导程序。我们提出一种利用程序综合来自动生成指导程序的方法。一个关键的挑战是如何处理部分可观察的环境。我们提出了模型预测程序综合，该模型可以训练生成模型来预测世界上未观察到的部分，然后基于该模型的样本以对其不确定性具有鲁棒性的方式来综合程序。我们以一系列具有挑战性的基准评估我们的方法，包括受Minecraft启发的2D``工艺''环境，代理必须执行复杂的子任务序列才能实现其目标，需要抽象推理的盒子世界环境，以及代理为MuJoCo的工艺环境的变体蚂蚁。我们的方法明显优于几个基准，并且在性能上与提供有效程序的预言机相当。

更新日期：2021-02-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文