当前位置:
X-MOL 学术
›
arXiv.cs.AI
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting
arXiv - CS - Artificial Intelligence Pub Date : 2020-07-14 , DOI: arxiv-2007.07011 Jorge A. Mendez and Boyu Wang and Eric Eaton
arXiv - CS - Artificial Intelligence Pub Date : 2020-07-14 , DOI: arxiv-2007.07011 Jorge A. Mendez and Boyu Wang and Eric Eaton
Policy gradient methods have shown success in learning control policies for
high-dimensional dynamical systems. Their biggest downside is the amount of
exploration they require before yielding high-performing policies. In a
lifelong learning setting, in which an agent is faced with multiple consecutive
tasks over its lifetime, reusing information from previously seen tasks can
substantially accelerate the learning of new tasks. We provide a novel method
for lifelong policy gradient learning that trains lifelong function
approximators directly via policy gradients, allowing the agent to benefit from
accumulated knowledge throughout the entire training process. We show
empirically that our algorithm learns faster and converges to better policies
than single-task and lifelong learning baselines, and completely avoids
catastrophic forgetting on a variety of challenging domains.
中文翻译:
因子策略的终身策略梯度学习,以实现更快的训练而不会忘记
策略梯度方法在学习高维动态系统的控制策略方面取得了成功。他们最大的缺点是在产生高性能政策之前需要进行大量的探索。在终身学习环境中,代理在其生命周期中面临多个连续任务,重用以前见过的任务的信息可以大大加速新任务的学习。我们提供了一种终生策略梯度学习的新方法,该方法直接通过策略梯度训练终生函数逼近器,使代理能够从整个训练过程中积累的知识中受益。我们凭经验表明,与单任务和终身学习基线相比,我们的算法学习速度更快并收敛到更好的策略,
更新日期:2020-10-23
中文翻译:
因子策略的终身策略梯度学习,以实现更快的训练而不会忘记
策略梯度方法在学习高维动态系统的控制策略方面取得了成功。他们最大的缺点是在产生高性能政策之前需要进行大量的探索。在终身学习环境中,代理在其生命周期中面临多个连续任务,重用以前见过的任务的信息可以大大加速新任务的学习。我们提供了一种终生策略梯度学习的新方法,该方法直接通过策略梯度训练终生函数逼近器,使代理能够从整个训练过程中积累的知识中受益。我们凭经验表明,与单任务和终身学习基线相比,我们的算法学习速度更快并收敛到更好的策略,