当前位置: X-MOL 学术Proc. Natl. Acad. Sci. U.S.A. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Human subjects exploit a cognitive map for credit assignment [Psychological and Cognitive Sciences]
Proceedings of the National Academy of Sciences of the United States of America ( IF 9.4 ) Pub Date : 2021-01-26 , DOI: 10.1073/pnas.2016884118
Rani Moran 1, 2 , Peter Dayan 3, 4 , Raymond J Dolan 2, 5
Affiliation  

An influential reinforcement learning framework proposes that behavior is jointly governed by model-free (MF) and model-based (MB) controllers. The former learns the values of actions directly from past encounters, and the latter exploits a cognitive map of the task to calculate these prospectively. Considerable attention has been paid to how these systems interact during choice, but how and whether knowledge of a cognitive map contributes to the way MF and MB controllers assign credit (i.e., to how they revaluate actions and states following the receipt of an outcome) remains underexplored. Here, we examine such sophisticated credit assignment using a dual-outcome bandit task. We provide evidence that knowledge of a cognitive map influences credit assignment in both MF and MB systems, mediating subtly different aspects of apparent relevance. Specifically, we show MF credit assignment is enhanced for those rewards that are related to a choice, and this contrasted with choice-unrelated rewards that reinforced subsequent choices negatively. This modulation is only possible based on knowledge of task structure. On the other hand, MB credit assignment was boosted for outcomes that impacted on differences in values between offered bandits. We consider mechanistic accounts and the normative status of these findings. We suggest the findings extend the scope and sophistication of cognitive map-based credit assignment during reinforcement learning, with implications for understanding behavioral control.



中文翻译:

人类受试者利用认知地图进行学分分配[心理和认知科学]

一个有影响的强化学习框架提出行为由无模型(MF)和基于模型(MB)控制器共同控制。前者直接从过去的遭遇中学习行动的价值,后者利用任务的认知地图来前瞻性地计算这些价值。已经相当关注这些系统在选择过程中如何相互作用,但是认知图的知识如何以及是否有助于 MF 和 MB 控制器分配信用的方式(即,他们如何在收到结果后重新评估行动和状态)未充分探索。在这里,我们使用双重结果老虎机任务检查这种复杂的信用分配。我们提供的证据表明,认知地图的知识会影响 MF 和 MB 系统中的信用分配,调解明显相关的微妙不同方面。具体来说,我们展示了那些与选择相关的奖励的 MF 信用分配得到了增强,而这与选择无关的奖励相反,后者消极地强化了后续的选择。这种调制只能基于对任务结构的了解。另一方面,MB 信用分配因影响所提供强盗之间的价值差异的结果而得到提升。我们考虑机械解释和这些发现的规范状态。我们建议这些发现扩展了强化学习期间基于认知地图的信用分配的范围和复杂性,这对理解行为控制具有重要意义。这种调制只能基于对任务结构的了解。另一方面,MB 信用分配因影响所提供强盗之间的价值差异的结果而得到提升。我们考虑机械解释和这些发现的规范状态。我们建议这些发现扩展了强化学习期间基于认知地图的信用分配的范围和复杂性,这对理解行为控制具有重要意义。这种调制只能基于对任务结构的了解。另一方面,MB 信用分配因影响所提供强盗之间的价值差异的结果而得到提升。我们考虑机械解释和这些发现的规范状态。我们建议这些发现扩展了强化学习期间基于认知地图的信用分配的范围和复杂性,这对理解行为控制具有重要意义。

更新日期:2021-01-22
down
wechat
bug