当前位置: X-MOL 学术arXiv.cs.AI › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Trying AGAIN instead of Trying Longer: Prior Learning for Automatic Curriculum Learning
arXiv - CS - Artificial Intelligence Pub Date : 2020-04-07 , DOI: arxiv-2004.03168
R\'emy Portelas and Katja Hofmann and Pierre-Yves Oudeyer

A major challenge in the Deep RL (DRL) community is to train agents able to generalize over unseen situations, which is often approached by training them on a diversity of tasks (or environments). A powerful method to foster diversity is to procedurally generate tasks by sampling their parameters from a multi-dimensional distribution, enabling in particular to propose a different task for each training episode. In practice, to get the high diversity of training tasks necessary for generalization, one has to use complex procedural generation systems. With such generators, it is hard to get prior knowledge on the subset of tasks that are actually learnable at all (many generated tasks may be unlearnable), what is their relative difficulty and what is the most efficient task distribution ordering for training. A typical solution in such cases is to rely on some form of Automated Curriculum Learning (ACL) to adapt the sampling distribution. One limit of current approaches is their need to explore the task space to detect progress niches over time, which leads to a loss of time. Additionally, we hypothesize that the induced noise in the training data may impair the performances of brittle DRL learners. We address this problem by proposing a two stage ACL approach where 1) a teacher algorithm first learns to train a DRL agent with a high-exploration curriculum, and then 2) distills learned priors from the first run to generate an "expert curriculum" to re-train the same agent from scratch. Besides demonstrating 50% improvements on average over the current state of the art, the objective of this work is to give a first example of a new research direction oriented towards refining ACL techniques over multiple learners, which we call Classroom Teaching.

中文翻译:

再次尝试而不是尝试更长时间:自动课程学习的先验学习

Deep RL (DRL) 社区的一个主要挑战是训练能够概括未知情况的代理,这通常通过在各种任务(或环境)上训练它们来解决。促进多样性的一种强大方法是通过从多维分布中采样参数来程序化生成任务,特别是能够为每个训练集提出不同的任务。在实践中,为了获得泛化所需的高度多样性的训练任务,必须使用复杂的程序生成系统。使用这样的生成器,很难获得关于实际上可学习的任务子集的先验知识(许多生成的任务可能是不可学习的),它们的相对难度是什么以及最有效的训练任务分配顺序是什么。在这种情况下,典型的解决方案是依靠某种形式的自动课程学习 (ACL) 来调整抽样分布。当前方法的一个限制是他们需要探索任务空间以随着时间的推移检测进展利基,这会导致时间损失。此外,我们假设训练数据中的诱导噪声可能会损害脆弱的 DRL 学习者的表现。我们通过提出两阶段 ACL 方法来解决这个问题,其中 1) 教师算法首先学习训练具有高探索性课程的 DRL 代理,然后 2) 从第一次运行中提取学习到的先验以生成“专家课程”以从头开始重新训练同一个代理。除了比当前的技术水平平均提高 50% 之外,
更新日期:2020-04-08
down
wechat
bug