当前位置: X-MOL 学术arXiv.cs.RO › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
REPAINT: Knowledge Transfer in Deep Actor-Critic Reinforcement Learning
arXiv - CS - Robotics Pub Date : 2020-11-24 , DOI: arxiv-2011.11827
Yunzhe Tao, Sahika Genc, Tao Sun, Sunil Mallya

Accelerating the learning processes for complex tasks by leveraging previously learned tasks has been one of the most challenging problems in reinforcement learning, especially when the similarity between source and target tasks is low or unknown. In this work, we propose a REPresentation-And-INstance Transfer algorithm (REPAINT) for deep actor-critic reinforcement learning paradigm. In representation transfer, we adopt a kickstarted training method using a pre-trained teacher policy by introducing an auxiliary cross-entropy loss. In instance transfer, we develop a sampling approach, i.e., advantage-based experience replay, on transitions collected following the teacher policy, where only the samples with high advantage estimates are retained for policy update. We consider both learning an unseen target task by transferring from previously learned teacher tasks and learning a partially unseen task composed of multiple sub-tasks by transferring from a pre-learned teacher sub-task. In several benchmark experiments, REPAINT significantly reduces the total training time and improves the asymptotic performance compared to training with no prior knowledge and other baselines.

中文翻译:

REPAINT:深度演员批判强化学习中的知识转移

通过利用先前学习的任务来加速复杂任务的学习过程一直是强化学习中最具挑战性的问题之一,尤其是当源任务与目标任务之间的相似性较低或未知时。在这项工作中,我们针对深层行为者-批判强化学习范例提出了一种“演示和实例转移”算法(REPAINT)。在表征转移中,我们通过引入辅助交叉熵损失,采用了预先训练的教师策略的启动训练方法。在实例转移中,我们根据教师政策收集的过渡情况,开发了一种抽样方法,即基于优势的经验重播,其中仅保留具有较高优势估计值的样本以进行政策更新。我们认为既可以通过从先前学习的老师任务中转移来学习一个看不见的目标任务,又可以通过从一个预先学习的老师子任务中转移来学习一个由多个子任务组成的部分看不见的任务。在一些基准实验中,与没有先验知识和其他基​​准的训练相比,REPAINT显着减少了总训练时间并改善了渐近性能。
更新日期:2020-11-25
down
wechat
bug