当前位置: X-MOL 学术Psychological Review › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Temporal and state abstractions for efficient learning, transfer, and composition in humans.
Psychological Review ( IF 5.4 ) Pub Date : 2021-05-20 , DOI: 10.1037/rev0000295
Liyu Xia 1 , Anne G E Collins 2
Affiliation  

Humans use prior knowledge to efficiently solve novel tasks, but how they structure past knowledge during learning to enable such fast generalization is not well understood. We recently proposed that hierarchical state abstraction enabled generalization of simple one-step rules, by inferring context clusters for each rule. However, humans' daily tasks are often temporally extended, and necessitate more complex multi-step, hierarchically structured strategies. The options framework in hierarchical reinforcement learning provides a theoretical framework for representing such transferable strategies. Options are abstract multi-step policies, assembled from simpler one-step actions or other options, that can represent meaningful reusable strategies as temporal abstractions. We developed a novel sequential decision-making protocol to test if humans learn and transfer multi-step options. In a series of four experiments, we found transfer effects at multiple hierarchical levels of abstraction that could not be explained by flat reinforcement learning models or hierarchical models lacking temporal abstractions. We extended the options framework to develop a quantitative model that blends temporal and state abstractions. Our model captures the transfer effects observed in human participants. Our results provide evidence that humans create and compose hierarchical options, and use them to explore in novel contexts, consequently transferring past knowledge and speeding up learning. (PsycInfo Database Record (c) 2021 APA, all rights reserved).

中文翻译:

用于人类高效学习、迁移和组合的时间和状态抽象。

人类利用先验知识来有效地解决新任务,但他们在学习过程中如何构建过去的知识以实现如此快速的泛化尚不清楚。我们最近提出,分层状态抽象通过推断每个规则的上下文集群来实现简单的一步规则的泛化。然而,人类的日常任务通常会在时间上延长,并且需要更复杂的多步骤、分层结构的策略。分层强化学习中的选项框架提供了表示此类可转移策略的理论框架。选项是抽象的多步骤策略,由更简单的单步操作或其他选项组装而成,可以将有意义的可重用策略表示为时间抽象。我们开发了一种新颖的顺序决策协议来测试人类是否学习和转移多步骤选项。在一系列的四个实验中,我们发现多个抽象层次上的转移效应无法用平面强化学习模型或缺乏时间抽象的层次模型来解释。我们扩展了选项框架来开发混合时间和状态抽象的定量模型。我们的模型捕捉了在人类参与者中观察到的转移效应。我们的结果提供了证据,表明人类创建和构成分层选项,并使用它们在新的环境中进行探索,从而转移过去的知识并加速学习。(PsycInfo 数据库记录 (c) 2021 APA,保留所有权利)。
更新日期:2021-05-20
down
wechat
bug