Towards Effective Context for Meta-Reinforcement Learning: an Approach based on Contrastive Learning,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards Effective Context for Meta-Reinforcement Learning: an Approach based on Contrastive Learning
arXiv - CS - Machine Learning Pub Date : 2020-09-29 , DOI: arxiv-2009.13891
Haotian Fu, Hongyao Tang, Jianye Hao, Chen Chen, Xidong Feng, Dong Li, Wulong Liu

Context, the embedding of previous collected trajectories, is a powerful construct for Meta-Reinforcement Learning (Meta-RL) algorithms. By conditioning on an effective context, Meta-RL policies can easily generalize to new tasks within a few adaptation steps. We argue that improving the quality of context involves answering two questions: 1. How to train a compact and sufficient encoder that can embed the task-specific information contained in prior trajectories? 2. How to collect informative trajectories of which the corresponding context reflects the specification of tasks? To this end, we propose a novel Meta-RL framework called CCM (Contrastive learning augmented Context-based Meta-RL). We first focus on the contrastive nature behind different tasks and leverage it to train a compact and sufficient context encoder. Further, we train a separate exploration policy and theoretically derive a new information-gain-based objective which aims to collect informative trajectories in a few steps. Empirically, we evaluate our approaches on common benchmarks as well as several complex sparse-reward environments. The experimental results show that CCM outperforms state-of-the-art algorithms by addressing previously mentioned problems respectively.

中文翻译：

Towards Effective Context for Meta-Reinforcement Learning：一种基于对比学习的方法

上下文，即先前收集到的轨迹的嵌入，是元强化学习 (Meta-RL) 算法的强大构造。通过以有效上下文为条件，Meta-RL 策略可以在几个适应步骤内轻松推广到新任务。我们认为，提高上下文质量涉及回答两个问题：1. 如何训练一个紧凑且足够的编码器，该编码器可以嵌入包含在先前轨迹中的特定于任务的信息？2. 如何收集对应上下文反映任务规范的信息轨迹？为此，我们提出了一种新的 Meta-RL 框架，称为 CCM（对比学习增强基于上下文的 Meta-RL）。我们首先关注不同任务背后的对比性质，并利用它来训练一个紧凑且足够的上下文编码器。更多，我们训练了一个单独的探索策略，并从理论上推导出一个新的基于信息增益的目标，旨在通过几个步骤收集信息轨迹。根据经验，我们在通用基准以及几个复杂的稀疏奖励环境中评估我们的方法。实验结果表明，通过分别解决前面提到的问题，CCM 优于最先进的算法。

更新日期：2020-10-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>