当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations
Knowledge-Based Systems ( IF 7.2 ) Pub Date : 2021-02-12 , DOI: 10.1016/j.knosys.2021.106844
Alexey Skrynnik , Aleksey Staroverov , Ermek Aitygulov , Kirill Aksenov , Vasilii Davydov , Aleksandr I. Panov

Deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. These results are commonly achieved at the expense of huge computational costs and require an incredible number of episodes of interactions between the agent and the environment. Hierarchical methods and expert demonstrations are among the most promising approaches to improve the sample efficiency of reinforcement learning methods. In this paper, we propose a combination of methods that allow the agent to use low-quality demonstrations in complex vision-based environments with multiple related goals. Our Forgetful Experience Replay (ForgER) algorithm effectively handles expert data errors and reduces quality losses when adapting the action space and states representation to the agent’s capabilities. The proposed goal-oriented replay buffer structure allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. Our method has a high degree of versatility and can be integrated into various off-policy methods. The ForgER surpasses the existing state-of-the-art RL methods using expert demonstrations in complex environments. The solution based on our algorithm beats other solutions for the famous MineRL competition and allows the agent to demonstrate the behavior at the expert level.



中文翻译:

从专家演示中学习分层强化学习中的健忘体验


深度强化学习(RL)在复杂的游戏和机器人环境中显示出令人印象深刻的结果。通常要以巨大的计算成本为代价来获得这些结果,并且需要在代理与环境之间进行大量的交互。分层方法和专家演示是提高强化学习方法的样本效率的最有前途的方法之一。在本文中,我们提出了一种方法的组合,允许代理在具有多个相关目标的复杂的基于视觉的环境中使用低质量的演示。我们的健忘体验重播(ForgER)算法可有效地处理专家数据错误,并在调整操作空间并根据业务代表的状态陈述表示时减少质量损失。提出的面向目标的重放缓冲区结构允许代理自动突出显示用于解决演示中复杂层次任务的子目标。我们的方法具有高度的通用性,可以集成到各种非政策方法中。通过在复杂环境中的专家演示,ForgER超越了现有的最新RL方法。基于我们的算法的解决方案在著名的MineRL竞赛中击败了其他解决方案,并允许代理在专家级别展示行为。通过在复杂环境中的专家演示,ForgER超越了现有的最新RL方法。基于我们的算法的解决方案在著名的MineRL竞赛中击败了其他解决方案,并允许代理在专家级别展示行为。通过在复杂环境中的专家演示,ForgER超越了现有的最新RL方法。基于我们的算法的解决方案在著名的MineRL竞赛中击败了其他解决方案,并允许代理在专家级别展示行为。

更新日期:2021-02-17
down
wechat
bug