当前位置: X-MOL 学术Neural Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients.
Neural Computation ( IF 2.9 ) Pub Date : 2021-05-13 , DOI: 10.1162/neco_a_01387
Paulo Rauber 1 , Avinash Ummadisingu 2 , Filipe Mutz 3 , Jürgen Schmidhuber 4
Affiliation  

A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enabling sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this letter, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency.

中文翻译:

具有事后策略梯度的稀疏奖励环境中的强化学习。

需要在不同阶段追求不同目标的强化学习代理需要目标条件策略。除了将理想行为概括为看不见的目标的潜力之外,此类策略还可以实现基于子目标的更高级别的规划。在稀疏奖励环境中,利用有关任意目标已实现程度的信息的能力,而另一个目标的意图似乎对于实现样本有效学习至关重要。然而,强化学习智能体直到最近才被赋予这种后见之明的能力。在这封信中,我们展示了如何将后见之明引入策略梯度方法,将这一想法推广到一大类成功的算法中。
更新日期:2021-05-13
down
wechat
bug