当前位置: X-MOL 学术Neurocomputing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bias-reduced hindsight experience replay with virtual goal prioritization
Neurocomputing ( IF 5.5 ) Pub Date : 2021-03-10 , DOI: 10.1016/j.neucom.2021.02.090
B Manela , A Biess

Hindsight Experience Replay (HER) is a multi-goal reinforcement learning algorithm for sparse reward functions. The algorithm treats every failure as a success for an alternative (virtual) goal that has been achieved in the episode. Virtual goals are randomly selected, irrespective of which are most instructive for the agent. In this paper, we present two improvements over the existing HER algorithm. First, we prioritize virtual goals from which the agent will learn more valuable information. We call this property the instructiveness of the virtual goal and define it by a heuristic measure, which expresses how well the agent will be able to generalize from that virtual goal to actual goals. Secondly, we reduce existing bias in HER by the removal of misleading samples. To test our algorithms, we built three challenging environments with sparse reward functions. Our empirical results in both environments show vast improvement in the final success rate and sample efficiency when compared to the original HER algorithm. A video showing experimental results is available at https://youtu.be/xjAiwJiSeLc.



中文翻译:

通过虚拟目标优先级重演减少偏见的事后经验

Hindsight Experience Replay(Hindsight体验重播)(HER)是用于稀疏奖励功能的多目标强化学习算法。该算法将每个失败都视为情节中已实现的替代(虚拟)目标的成功。虚拟目标是随机选择的,无论哪个目标对代理都有最大的指导意义。在本文中,我们提出了对现有HER算法的两项改进。首先,我们确定虚拟目标的优先级,以便代理从中学习更多有价值的信息。我们将此属性称为指导性虚拟目标并通过启发式度量对其进行定义,该度量表示代理将能够从该虚拟目标推广到实际目标的能力。其次,我们通过去除误导性样本来减少HER中的现有偏差。为了测试我们的算法,我们构建了具有稀疏奖励功能的三个具有挑战性的环境。与原始HER算法相比,我们在两种环境下的实验结果均显示出最终成功率和采样效率的显着提高。可以在https://youtu.be/xjAiwJiSeLc上获得显示实验结果的视频。

更新日期:2021-05-11
down
wechat
bug