Modeling awake hippocampal reactivations with model-based bidirectional search

Khamassi, Mehdi; Girard, Benoît

doi:10.1007/s00422-020-00817-x

Modeling awake hippocampal reactivations with model-based bidirectional search

Original Article
Published: 17 February 2020

Volume 114, pages 231–248, (2020)
Cite this article

Biological Cybernetics Aims and scope Submit manuscript

504 Accesses
10 Citations
2 Altmetric
Explore all metrics

Abstract

Hippocampal offline reactivations during reward-based learning, usually categorized as replay events, have been found to be important for performance improvement over time and for memory consolidation. Recent computational work has linked these phenomena to the need to transform reward information into state-action values for decision making and to propagate it to all relevant states of the environment. Nevertheless, it is still unclear whether an integrated reinforcement learning mechanism could account for the variety of awake hippocampal reactivations, including variety in order (forward and reverse reactivated trajectories) and variety in the location where they occur (reward site or decision-point). Here, we present a model-based bidirectional search model which accounts for a variety of hippocampal reactivations. The model combines forward trajectory sampling from current position and backward sampling through prioritized sweeping from states associated with large reward prediction errors until the two trajectories connect. This is repeated until stabilization of state-action values (convergence), which could explain why hippocampal reactivations drastically diminish when the animal’s performance stabilizes. Simulations in a multiple T-maze task show that forward reactivations are prominently found at decision-points while backward reactivations are exclusively generated at reward sites. Finally, the model can generate imaginary trajectories that are not allowed to the agent during task performance. We raise some experimental predictions and implications for future studies of the role of the hippocampo–prefronto–striatal network in learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Y-Maze for Assessment of Spatial Working and Reference Memory in Mice

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Emotion, motivation, decision-making, the orbitofrontal cortex, anterior cingulate cortex, and the amygdala

Article Open access 13 May 2023

Edmund T. Rolls

Notes

The code is available at https://github.com/MehdiKhamassi/RLwithReplay.

References

Arleo A, Gerstner W (2000) Spatial cognition and neuro-mimetic navigation: a model of hippocampal place cell activity. Biol Cybern 83(3):287–299
CAS PubMed Google Scholar
Aubin L, Khamassi M, Girard B (2018) Prioritized sweeping neural DynaQ with multiple predecessors, and hippocampal replays. In: Conference on biomimetic and biohybrid systems. Springer, pp 16–27
Barto AG (1995) Adaptive critics and the basal ganglia. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the Basal Ganglia. The MIT Press, Cambridge, pp 215–232
Google Scholar
Barto AG, Bradtke SJ, Singh SP (1995) Learning to act using real-time dynamic programming. Arti Intell 72(1–2):81–138
Google Scholar
Battaglia FP, Peyrache A, Khamassi M, Wiener SI et al (2008) Spatial decisions and neuronal activity in hippocampal projection zones in prefrontal cortex and striatum. Hippocampal place fields. Relevance Learn Mem 115:289–311
Google Scholar
Benchenane K, Peyrache A, Khamassi M, Tierney PL, Gioanni Y, Battaglia FP, Wiener SI (2010) Coherent theta oscillations and reorganization of spike timing in the hippocampal-prefrontal network upon learning. Neuron 66(6):921–936
CAS PubMed Google Scholar
Bhalla US (2019) Dendrites, deep learning, and sequences in the hippocampus. Hippocampus 29(3):239–251
PubMed Google Scholar
Buzsáki G (1989) Two-stage model of memory trace formation: a role for “noisy” brain states. Neuroscience 31(3):551–570
PubMed Google Scholar
Caluwaerts K, Staffa M, N’Guyen S, Grand C, Dollé L, Favre-Félix A, Girard B, Khamassi M (2012) A biologically inspired meta-control navigation system for the psikharpax rat robot. Bioinspir Biomim 7(2):025009
CAS PubMed Google Scholar
Cazé R, Khamassi M, Aubin L, Girard B (2018) Hippocampal replays under the scrutiny of reinforcement learning models. J Neurophysiol 120(6):2877–2896
PubMed Google Scholar
Cisek P, Puskas GA, El-Murr S (2009) Decisions in changing conditions: the urgency-gating model. J Neurosci 29(37):11560–11571
CAS PubMed PubMed Central Google Scholar
Cutsuridis V, Hasselmo M (2011) Spatial memory sequence encoding and replay during modeled theta and ripple oscillations. Cognit Comput 3(4):554–574
Google Scholar
Daw ND, Niv Y, Dayan P (2005) Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neurosci 8(12):1704
CAS PubMed Google Scholar
de Lavilléon G, Lacroix MM, Rondi-Reig L, Benchenane K (2015) Explicit memory creation during sleep demonstrates a causal role of place cells in navigation. Nat Neurosci 18(4):493–495
PubMed Google Scholar
Diba K, Buzsáki G (2007) Forward and reverse hippocampal place-cell sequences during ripples. Nat Neurosci 10(10):1241
CAS PubMed PubMed Central Google Scholar
Dollé L, Sheynikhovich D, Girard B, Chavarriaga R, Guillot A (2010) Path planning versus cue responding: a bio-inspired model of switching between navigation strategies. Biol Cybern 103(4):299–317
PubMed Google Scholar
Dollé L, Chavarriaga R, Guillot A, Khamassi M (2018) Interactions of spatial strategies producing generalization gradient and blocking: a computational approach. PLoS Comput Biol 14(4):e1006092
PubMed PubMed Central Google Scholar
Dollé L, Khamassi M, Girard B, Guillot A, Chavarriaga R (2008) Analyzing interactions between navigation strategies using a computational model of action selection. In: International conference on spatial cognition. Springer, pp 71–86
Foster DJ (2017) Replay comes of age. Ann Rev Neurosci 40:581–602
CAS PubMed Google Scholar
Foster DJ, Ma Wilson (2006) Reverse replay of behavioural sequences in hippocampal place cells during the awake state. Nature 440(7084):680–683
CAS PubMed Google Scholar
Foster D, Morris R, Dayan P (2000) A model of hippocampally dependent navigation, using the temporal difference learning rule. Hippocampus 10(1):1–16
CAS PubMed Google Scholar
Frank MJ, Claus ED (2006) Anatomy of a decision: striato-orbitofrontal interactions in reinforcement learning, decision making, and reversal. Psychol Rev 113(2):300
PubMed Google Scholar
Frankland PW, Bontempi B (2005) The organization of recent and remote memories. Nat Rev Neurosci 6(2):119–130
CAS PubMed Google Scholar
Girardeau G, Benchenane K, Wiener SI, Buzsáki G, Zugaro MB (2009) Selective suppression of hippocampal ripples impairs spatial memory. Nat Neurosci 12(10):1222–1223
CAS PubMed Google Scholar
Guazzelli A, Bota M, Corbacho FJ, Arbib MA (1998) Affordances. Motivations, and the world graph theory. Adapt Behav 6(3–4):435–471
Google Scholar
Gupta AS, van der Meer MAA, Touretzky DS, Redish AD (2010) Hippocampal replay is not a simple function of experience. Neuron 65(5):695–705
CAS PubMed PubMed Central Google Scholar
Jadhav SP, Kemere C, German PW, Frank LM (2012) Awake hippocampal sharp-wave ripples support spatial memory. Science 336(6087):1454–1458
CAS PubMed PubMed Central Google Scholar
Jahnke S, Timme M, Memmesheimer RM (2015) A unified dynamic model for learning, replay, and sharp-wave/ripples. J Neurosci 35(49):16236–16258
CAS PubMed PubMed Central Google Scholar
Johnson A, Redish AD (2005) Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model. Neural Netw 18(9):1163–1171
PubMed Google Scholar
Johnson A, Redish AD (2007) Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. J Neurosci 27(45):12176–12189
CAS PubMed PubMed Central Google Scholar
Johnson A, van der Meer MA, Redish AD (2007) Integrating hippocampus and striatum in decision-making. Curr Opin Neurobiol 17(6):692–697
CAS PubMed Google Scholar
Jones JL, Esber GR, McDannald MA, Gruber AJ, Hernandez A, Mirenzi A, Schoenbaum G (2012) Orbitofrontal cortex supports behavior and learning using inferred but not cached values. Science 338(6109):953–956
CAS PubMed PubMed Central Google Scholar
Karlsson MP, Frank LM (2009) Awake replay of remote experiences in the hippocampus. Nat Neurosci 12(7):913
CAS PubMed PubMed Central Google Scholar
Khamassi M, Humphries MD (2012) Integrating cortico-limbic-basal ganglia architectures for learning model-based and model-free navigation strategies. Front Behav Neurosci 6:79
PubMed PubMed Central Google Scholar
Khamassi M, Quilodran R, Enel P, Dominey P, Procyk E (2015) Behavioral regulation and the modulation of information coding in the lateral prefrontal and cingulate cortex. Cereb Cortex 25(9):3197–3218
PubMed Google Scholar
Klein-Flügge MC, Barron HC, Brodersen KH, Dolan RJ, Behrens TEJ (2013) Segregated encoding of reward-identity and stimulus-reward associations in human orbitofrontal cortex. J Neurosci 33(7):3202–3211
PubMed PubMed Central Google Scholar
Lansink CS, Goltstein PM, Lankelma JV, McNaughton BL, Pennartz CMA (2009) Hippocampus leads ventral striatum in replay of place-reward information. PLoS Biol 7(8):e1000173
PubMed PubMed Central Google Scholar
Lee AK, Wilson MA (2002) Memory of sequential experience in the hippocampus during slow wave sleep. Neuron 36(6):1183–1194
CAS PubMed Google Scholar
Levy WB (1996) A sequence predicting ca3 is a flexible associator that learns and uses context to solve hippocampal-like tasks. Hippocampus 6(6):579–590
CAS PubMed Google Scholar
Lin LJ (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3/4):69–97
Google Scholar
Maingret N, Girardeau G, Todorova R, Goutierre M, Zugaro M (2016) Hippocampo-cortical coupling mediates memory consolidation during sleep. Nat Neurosci 19(7):959–964
CAS PubMed Google Scholar
Mattar MG, Daw ND (2018) Prioritized memory access explains planning and hippocampal replay. Nat Neurosci 21(11):1609
CAS PubMed PubMed Central Google Scholar
Miller EK, Cohen JD (2001) An integrative theory of prefrontal cortex function. Ann Rev Neurosci 24(1):167–202
CAS PubMed Google Scholar
Moore AW, Atkeson CG (1993) Prioritized sweeping: reinforcement learning with less data and less time. Mach Learn 13(1):103–130
Google Scholar
O’Keefe J, Dostrovsky J (1971) The hippocampus as a spatial map: preliminary evidence from unit activity in the freely-moving rat. Brain Res 34(1):171–175
PubMed Google Scholar
Ólafsdóttir HF, Barry C, Saleem AB, Hassabis D, Spiers HJ (2015) Hippocampal place cells construct reward related sequences through unexplored space. eLife 4(JUNE):e06063
PubMed PubMed Central Google Scholar
Ólafsdóttir HF, Bush D, Barry C (2018) The role of hippocampal replay in memory and planning. Curr Biol 28(1):R37–R50
PubMed PubMed Central Google Scholar
Palminteri S, Lefebvre G, Kilford EJ, Blakemore SJ (2017) Confirmation bias in human reinforcement learning: evidence from counterfactual feedback processing. PLoS Comput Biol 13(8):e1005684
PubMed PubMed Central Google Scholar
Papale AE, Zielinski MC, Frank LM, Jadhav SP, Redish AD (2016) Interplay between hippocampal sharp-wave-ripple events and vicarious trial and error behaviors in decision making. Neuron 92(5):1–8
Google Scholar
Park SA, Miller DS, Nili H, Ranganath C, Boorman ED (2019) Map making: constructing, combining, and navigating abstract cognitive maps. BioRxiv p 810051
Pasupathy A, Miller EK (2005) Different time courses of learning-related activity in the prefrontal cortex and striatum. Nature 433(7028):873
CAS PubMed Google Scholar
Peng J, Williams RJ (1993) Efficient learning and planning within the Dyna framework. Adapt Behav 1(4):437–454
Google Scholar
Peyrache A, Khamassi M, Benchenane K, Wiener SI, Battaglia FP (2009) Replay of rule-learning related neural patterns in the prefrontal cortex during sleep. Nat Neurosci 12(7):919–926
CAS PubMed Google Scholar
Pezzulo G, Rigoli F, Chersi F (2013) The mixed instrumental controller: using value of information to combine habitual choice and mental simulation. Front Psychol 4:212
Google Scholar
Pezzulo G, van der Meer MAA, Lansink CS, Pennartz CMA (2014) Internally generated sequences in learning and executing goal-directed behavior. Trends Cognit Sci 18(12):647–657
Google Scholar
Pezzulo G, Kemere C, Van Der Meer MA (2017) Internally generated hippocampal sequences as a vantage point to probe future-oriented cognition. Ann N Y Acad Sci 1396(1):144–165
PubMed Google Scholar
Pfeiffer BE, Foster DJ (2013) Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497(7447):74
CAS PubMed PubMed Central Google Scholar
Pohl I (1971) Bi-directional search. Mach Intell 6(127–140):10
Google Scholar
Redish AD (2016) Vicarious trial and error. Nat Rev Neurosci 17(3):147–159
CAS PubMed PubMed Central Google Scholar
Renaudo E, Girard B, Chatila R, Khamassi M (2014) Design of a control architecture for habit learning in robots. In: Conference on biomimetic and biohybrid systems. Springer, pp 249–260
Rennó-Costa C, da Silva ACC, Blanco W, Ribeiro S (2019) Computational models of memory consolidation and long-term synaptic plasticity during sleep. Neurobiol Learn Mem 160:32–47
PubMed Google Scholar
Roumis DK, Frank LM (2015) Hippocampal sharp-wave ripples in waking and sleeping states. Curr Opin Neurobiol 35:6–12
CAS PubMed PubMed Central Google Scholar
Saravanan V, Arabali D, Jochems A, Cui AX, Gootjes-Dreesbach L, Cutsuridis V, Yoshida M (2015) Transition between encoding and consolidation/replay dynamics via cholinergic modulation of can current: a modeling study. Hippocampus 25(9):1052–1070
CAS PubMed Google Scholar
Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275:1593–1599
CAS PubMed Google Scholar
Stachenfeld KL, Botvinick MM, Gershman SJ (2017) The hippocampus as a predictive map. Nat Neurosci 20(11):1643
CAS PubMed Google Scholar
Sutton RS (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the seventh international conference on machine learning, pp 216–224
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Google Scholar
van der Meer M, Kurth-Nelson Z, Redish AD (2012) Information processing in decision-making systems. Neuroscientist 18(4):342–359
PubMed PubMed Central Google Scholar
Viejo G, Khamassi M, Brovelli A, Girard B (2015) Modeling choice and reaction time during arbitrary visuomotor learning through the coordination of adaptive working memory and reinforcement learning. Front Behav Neurosci 9:225
PubMed PubMed Central Google Scholar
Wikenheiser AM, Schoenbaum G (2016) Over the river, through the woods: cognitive maps in the hippocampus and orbitofrontal cortex. Nat Rev Neurosci 17(8):513–523
CAS PubMed PubMed Central Google Scholar
Wilson MA, McNaughton BL (1994) Reactivation of hippocampal ensemble memories during sleep. Science (New York, NY) 265(5172):676–679
CAS Google Scholar
Zhou J, Montesinos-Cartagena M, Wikenheiser AM, Gardner MP, Niv Y, Schoenbaum G (2019) Complementary task structure representations in hippocampus and orbitofrontal cortex during an odor sequence task. Curr Biol 29(20):3402–3409
CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Intelligent Systems and Robotics (ISIR), Sorbonne Université and CNRS (Centre National de la Recherche Scientifique), 75005, Paris, France
Mehdi Khamassi & Benoît Girard

Authors

Mehdi Khamassi
View author publications
You can also search for this author in PubMed Google Scholar
Benoît Girard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mehdi Khamassi.

Additional information

Communicated by Jean-Marc Fellous.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to a Special Issue on Complex Spatial Navigation in Animals, Computational Models and Neuro-inspired Robots.

This work has received funding from the European Union’s Horizon 2020 Research and Innovation Program under Grant Agreement No. 640891 (DREAM Project), and from the CNRS 80|PRIME Research Program (RHiPAR Project). This work was performed within the Labex SMART (ANR-11-LABX-65) supported by French state funds managed by the ANR within the Investissements d’Avenir programme under reference ANR-11-IDEX-0004-02.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khamassi, M., Girard, B. Modeling awake hippocampal reactivations with model-based bidirectional search. Biol Cybern 114, 231–248 (2020). https://doi.org/10.1007/s00422-020-00817-x

Download citation

Received: 30 July 2019
Accepted: 21 January 2020
Published: 17 February 2020
Issue Date: April 2020
DOI: https://doi.org/10.1007/s00422-020-00817-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling awake hippocampal reactivations with model-based bidirectional search

Abstract

Access this article

Similar content being viewed by others

The Y-Maze for Assessment of Spatial Working and Reference Memory in Mice

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Emotion, motivation, decision-making, the orbitofrontal cortex, anterior cingulate cortex, and the amygdala

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation