Model based planners reflect on their model-free propensities,PLOS Computational Biology

当前位置： X-MOL 学术 › PLoS Comput. Biol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Model based planners reflect on their model-free propensities
PLOS Computational Biology ( IF 3.8 ) Pub Date : 2021-01-07 , DOI: 10.1371/journal.pcbi.1008552
Rani Moran _{1,

2} , Mehdi Keramati _{1,

2,

3} , Raymond J Dolan _{1,

2}

Affiliation

Dual-reinforcement learning theory proposes behaviour is under the tutelage of a retrospective, value-caching, model-free (MF) system and a prospective-planning, model-based (MB), system. This architecture raises a question as to the degree to which, when devising a plan, a MB controller takes account of influences from its MF counterpart. We present evidence that such a sophisticated self-reflective MB planner incorporates an anticipation of the influences its own MF-proclivities exerts on the execution of its planned future actions. Using a novel bandit task, wherein subjects were periodically allowed to design their environment, we show that reward-assignments were constructed in a manner consistent with a MB system taking account of its MF propensities. Thus, in the task participants assigned higher rewards to bandits that were momentarily associated with stronger MF tendencies. Our findings have implications for a range of decision making domains that includes drug abuse, pre-commitment, and the tension between short and long-term decision horizons in economics.

中文翻译：

基于模型的规划者反思他们的无模型倾向

双强化学习理论提出，行为受到回顾性、价值缓存、无模型 (MF) 系统和前瞻性规划、基于模型 (MB) 系统的指导。这种架构提出了一个问题，即在设计计划时，MB 控制器在多大程度上考虑了 MF 对应项的影响。我们提供的证据表明，这样一个复杂的自我反思 MB 规划者包含了对其自身 MF 倾向对其计划的未来行动的执行所产生的影响的预期。使用一种新颖的强盗任务，其中受试者被定期地设计他们的环境，我们表明，奖励分配的构建方式与 MB 系统一致，考虑到了其 MF 倾向。因此，在任务中，参与者为暂时与更强的 MF 倾向相关的强盗分配了更高的奖励。我们的研究结果对一系列决策领域具有影响，包括药物滥用、预先承诺以及经济学中短期和长期决策视野之间的紧张关系。

更新日期：2021-01-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11