当前位置: X-MOL 学术Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hierarchical clustering optimizes the tradeoff between compositionality and expressivity of task structures for flexible reinforcement learning
Artificial Intelligence ( IF 14.4 ) Pub Date : 2022-08-05 , DOI: 10.1016/j.artint.2022.103770
Rex G Liu 1 , Michael J Frank 1
Affiliation  

A hallmark of human intelligence, but challenging for reinforcement learning (RL) agents, is the ability to compositionally generalise, that is, to recompose familiar knowledge components in novel ways to solve new problems. For instance, when navigating in a city, one needs to know the location of the destination and how to operate a vehicle to get there, whether it be pedalling a bike or operating a car. In RL, these correspond to the reward function and transition function, respectively. To compositionally generalize, these two components need to be transferable independently of each other: multiple modes of transport can reach the same goal, and any given mode can be used to reach multiple destinations. Yet there are also instances where it can be helpful to learn and transfer entire structures, jointly representing goals and transitions, particularly whenever these recur in natural tasks (e.g., given a suggestion to get ice cream, one might prefer to bike, even in new towns). Prior theoretical work has explored how, in model-based RL, agents can learn and generalize task components (transition and reward functions). But a satisfactory account for how a single agent can simultaneously satisfy the two competing demands is still lacking. Here, we propose a hierarchical RL agent that learns and transfers individual task components as well as entire structures (particular compositions of components) by inferring both through a non-parametric Bayesian model of the task. It maintains a factorised representation of task components through a hierarchical Dirichlet process, but it also represents different possible covariances between these components through a standard Dirichlet process. We validate our approach on a variety of navigation tasks covering a wide range of statistical correlations between task components and show that it can also improve generalisation and transfer in more complex, hierarchical tasks with goal/subgoal structures. Finally, we end with a discussion of our work including how this clustering algorithm could conceivably be implemented by cortico-striatal gating circuits in the brain.



中文翻译:

分层聚类优化了任务结构的组合性和表达性之间的权衡,以实现灵活的强化学习

人类智能的一个标志是组合概括的能力,即以新颖的方式重新组合熟悉的知识成分来解决新问题,这对强化学习(RL)智能体来说具有挑战性。例如,在城市中导航时,需要知道目的地的位置以及如何操作车辆到达目的地,无论是骑自行车还是驾驶汽车。在强化学习中,它们分别对应于奖励函数和转换函数。从组合上概括,这两个组件需要能够彼此独立地转移:多种运输方式可以达到同一目标,并且任何给定的方式都可以用于到达多个目的地。然而,在某些情况下,学习和转移整个结构、共同代表目标和转变也是有帮助的,特别是当这些在自然任务中重复出现时(例如,给出一个买冰淇淋的建议,人们可能更喜欢骑自行车,即使是在新的环境中)城市)。先前的理论工作已经探索了在基于模型的强化学习中,智能体如何学习和泛化任务组件(转换和奖励函数)。但对于单个代理如何同时满足两个相互竞争的需求,仍然缺乏令人满意的解释。在这里,我们提出了一种分层 RL 代理,它通过任务的非参数贝叶斯模型进行推断来学习和传输单个任务组件以及整个结构(组件的特定组合)。它通过分层狄利克雷过程维护任务组件的因式分解表示,但它也通过标准狄利克雷过程表示这些组件之间不同的可能协方差。我们在各种导航任务上验证了我们的方法,涵盖了任务组件之间的广泛统计相关性,并表明它还可以提高具有目标/子目标结构的更复杂、分层任务的泛化和迁移。最后,我们对我们的工作进行了讨论,包括如何通过大脑中的皮质纹状体门控电路来实现这种聚类算法。

更新日期:2022-08-05
down
wechat
bug