当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
End-to-End Hierarchical Reinforcement Learning With Integrated Subgoal Discovery
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.4 ) Pub Date : 2021-06-22 , DOI: 10.1109/tnnls.2021.3087733
Shubham Pateria , Budhitama Subagdja , Ah-Hwee Tan , Chai Quek

Hierarchical reinforcement learning (HRL) is a promising approach to perform long-horizon goal-reaching tasks by decomposing the goals into subgoals. In a holistic HRL paradigm, an agent must autonomously discover such subgoals and also learn a hierarchy of policies that uses them to reach the goals. Recently introduced end-to-end HRL methods accomplish this by using the higher-level policy in the hierarchy to directly search the useful subgoals in a continuous subgoal space. However, learning such a policy may be challenging when the subgoal space is large. We propose integrated discovery of salient subgoals (LIDOSS), an end-to-end HRL method with an integrated subgoal discovery heuristic that reduces the search space of the higher-level policy, by explicitly focusing on the subgoals that have a greater probability of occurrence on various state-transition trajectories leading to the goal. We evaluate LIDOSS on a set of continuous control tasks in the MuJoCo domain against hierarchical actor critic (HAC), a state-of-the-art end-to-end HRL method. The results show that LIDOSS attains better goal achievement rates than HAC in most of the tasks.

中文翻译:

具有集成子目标发现的端到端分层强化学习

分层强化学习 (HRL) 是一种很有前途的方法,可以通过将目标分解为子目标来执行长期目标达成任务。在整体 HRL 范例中,代理必须自主发现此类子目标,并学习使用它们实现目标的策略层次结构。最近引入的端到端 HRL 方法通过使用层次结构中的更高级别的策略直接在连续的子目标空间中搜索有用的子目标来实现这一点。然而,当子目标空间很大时,学习这样的策略可能具有挑战性。我们提出显着子目标的集成发现(LIDOSS),这是一种端到端的 HRL 方法,具有集成的子目标发现启发式方法,可减少更高级别策略的搜索空间,通过明确关注在通向目标的各种状态转换轨迹上更有可能出现的子目标。我们在 MuJoCo 域中的一组连续控制任务上评估 LIDOSS,并针对分层演员评论家 (HAC),这是一种最先进的端到端 HRL 方法。结果表明,在大多数任务中,LIDOSS 的目标达成率优于 HAC。
更新日期:2021-06-22
down
wechat
bug