当前位置: X-MOL 学术IEEE Trans. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Addressing Hindsight Bias in Multigoal Reinforcement Learning
IEEE Transactions on Cybernetics ( IF 11.8 ) Pub Date : 2021-09-08 , DOI: 10.1109/tcyb.2021.3107202
Chenjia Bai , Lingxiao Wang , Yixin Wang , Zhaoran Wang , Rui Zhao , Chenyao Bai , Peng Liu

Multigoal reinforcement learning (RL) extends the typical RL with goal-conditional value functions and policies. One efficient multigoal RL algorithm is the hindsight experience replay (HER). By treating a hindsight goal from failed experiences as the original goal, HER enables the agent to receive rewards frequently. However, a key assumption of HER is that the hindsight goals do not change the likelihood of the sampled transitions and trajectories used in training, which is not the fact according to our analysis. More specifically, we show that using hindsight goals changes such a likelihood and results in a biased learning objective for multigoal RL. We analyze the hindsight bias due to this use of hindsight goals and propose the bias-corrected HER (BHER), an efficient algorithm that corrects the hindsight bias in training. We further show that BHER outperforms several state-of-the-art multigoal RL approaches in challenging robotics tasks.

中文翻译:

解决多目标强化学习中的事后偏见

多目标强化学习 (RL) 使用目标条件价值函数和策略扩展了典型的 RL。一种有效的多目标 RL 算法是事后经验回放 (HER)。通过将失败经验的事后目标视为最初的目标,HER 使代理能够频繁地获得奖励。然而,HER 的一个关键假设是事后目标不会改变训练中使用的采样转换和轨迹的可能性,根据我们的分析,事实并非如此。更具体地说,我们表明使用后见之明目标会改变这种可能性,并导致多目标 RL 的学习目标有偏差。我们分析了由于使用后见之明目标而导致的后见之明偏差,并提出了偏差校正 HER (BHER),这是一种纠正训练中后见之明偏差的有效算法。
更新日期:2021-09-08
down
wechat
bug