当前位置: X-MOL 学术Biol. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Modeling awake hippocampal reactivations with model-based bidirectional search.
Biological Cybernetics ( IF 1.7 ) Pub Date : 2020-02-17 , DOI: 10.1007/s00422-020-00817-x
Mehdi Khamassi 1 , Benoît Girard 1
Affiliation  

Hippocampal offline reactivations during reward-based learning, usually categorized as replay events, have been found to be important for performance improvement over time and for memory consolidation. Recent computational work has linked these phenomena to the need to transform reward information into state-action values for decision making and to propagate it to all relevant states of the environment. Nevertheless, it is still unclear whether an integrated reinforcement learning mechanism could account for the variety of awake hippocampal reactivations, including variety in order (forward and reverse reactivated trajectories) and variety in the location where they occur (reward site or decision-point). Here, we present a model-based bidirectional search model which accounts for a variety of hippocampal reactivations. The model combines forward trajectory sampling from current position and backward sampling through prioritized sweeping from states associated with large reward prediction errors until the two trajectories connect. This is repeated until stabilization of state-action values (convergence), which could explain why hippocampal reactivations drastically diminish when the animal's performance stabilizes. Simulations in a multiple T-maze task show that forward reactivations are prominently found at decision-points while backward reactivations are exclusively generated at reward sites. Finally, the model can generate imaginary trajectories that are not allowed to the agent during task performance. We raise some experimental predictions and implications for future studies of the role of the hippocampo-prefronto-striatal network in learning.

中文翻译:

使用基于模型的双向搜索对清醒的海马激活进行建模。

人们发现,通常将基于奖励的学习过程中的海马离线重新激活(通常归类为重播事件)对于随时间推移的性能改善和内存整合非常重要。最近的计算工作已将这些现象与将奖励信息转换为决策所需的状态行为值并将其传播到环境的所有相关状态的需求联系在一起。然而,仍然不清楚整合的强化学习机制是否可以解释清醒的海马激活,包括顺序的变化(正向和反向激活的轨迹)和它们发生的位置的变化(奖励位点或决策点)。在这里,我们提出了一个基于模型的双向搜索模型,该模型说明了各种海马激活。该模型结合了从当前位置开始的向前轨迹采样和通过优先扫描从与大的奖励预测误差相关联的状态开始的向后采样,直到两个轨迹连接为止。重复此过程,直到状态作用值(收敛)稳定为止,这可以解释为什么当动物的行为稳定时海马激活会急剧减少。在多个T迷宫任务中进行的仿真显示,正向激活在决策点上很明显,而反向激活仅在奖励位置生成。最后,该模型可以生成在执行任务期间不允许代理执行的假想轨迹。我们提出了一些实验性预测和对海马-前额叶-纹状体网络在学习中的作用的未来研究的意义。
更新日期:2020-04-23
down
wechat
bug