Skip to main content
Log in

Modeling awake hippocampal reactivations with model-based bidirectional search

  • Original Article
  • Published:
Biological Cybernetics Aims and scope Submit manuscript

Abstract

Hippocampal offline reactivations during reward-based learning, usually categorized as replay events, have been found to be important for performance improvement over time and for memory consolidation. Recent computational work has linked these phenomena to the need to transform reward information into state-action values for decision making and to propagate it to all relevant states of the environment. Nevertheless, it is still unclear whether an integrated reinforcement learning mechanism could account for the variety of awake hippocampal reactivations, including variety in order (forward and reverse reactivated trajectories) and variety in the location where they occur (reward site or decision-point). Here, we present a model-based bidirectional search model which accounts for a variety of hippocampal reactivations. The model combines forward trajectory sampling from current position and backward sampling through prioritized sweeping from states associated with large reward prediction errors until the two trajectories connect. This is repeated until stabilization of state-action values (convergence), which could explain why hippocampal reactivations drastically diminish when the animal’s performance stabilizes. Simulations in a multiple T-maze task show that forward reactivations are prominently found at decision-points while backward reactivations are exclusively generated at reward sites. Finally, the model can generate imaginary trajectories that are not allowed to the agent during task performance. We raise some experimental predictions and implications for future studies of the role of the hippocampo–prefronto–striatal network in learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The code is available at https://github.com/MehdiKhamassi/RLwithReplay.

References

  • Arleo A, Gerstner W (2000) Spatial cognition and neuro-mimetic navigation: a model of hippocampal place cell activity. Biol Cybern 83(3):287–299

    CAS  PubMed  Google Scholar 

  • Aubin L, Khamassi M, Girard B (2018) Prioritized sweeping neural DynaQ with multiple predecessors, and hippocampal replays. In: Conference on biomimetic and biohybrid systems. Springer, pp 16–27

  • Barto AG (1995) Adaptive critics and the basal ganglia. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the Basal Ganglia. The MIT Press, Cambridge, pp 215–232

    Google Scholar 

  • Barto AG, Bradtke SJ, Singh SP (1995) Learning to act using real-time dynamic programming. Arti Intell 72(1–2):81–138

    Google Scholar 

  • Battaglia FP, Peyrache A, Khamassi M, Wiener SI et al (2008) Spatial decisions and neuronal activity in hippocampal projection zones in prefrontal cortex and striatum. Hippocampal place fields. Relevance Learn Mem 115:289–311

    Google Scholar 

  • Benchenane K, Peyrache A, Khamassi M, Tierney PL, Gioanni Y, Battaglia FP, Wiener SI (2010) Coherent theta oscillations and reorganization of spike timing in the hippocampal-prefrontal network upon learning. Neuron 66(6):921–936

    CAS  PubMed  Google Scholar 

  • Bhalla US (2019) Dendrites, deep learning, and sequences in the hippocampus. Hippocampus 29(3):239–251

    PubMed  Google Scholar 

  • Buzsáki G (1989) Two-stage model of memory trace formation: a role for “noisy” brain states. Neuroscience 31(3):551–570

    PubMed  Google Scholar 

  • Caluwaerts K, Staffa M, N’Guyen S, Grand C, Dollé L, Favre-Félix A, Girard B, Khamassi M (2012) A biologically inspired meta-control navigation system for the psikharpax rat robot. Bioinspir Biomim 7(2):025009

    CAS  PubMed  Google Scholar 

  • Cazé R, Khamassi M, Aubin L, Girard B (2018) Hippocampal replays under the scrutiny of reinforcement learning models. J Neurophysiol 120(6):2877–2896

    PubMed  Google Scholar 

  • Cisek P, Puskas GA, El-Murr S (2009) Decisions in changing conditions: the urgency-gating model. J Neurosci 29(37):11560–11571

    CAS  PubMed  PubMed Central  Google Scholar 

  • Cutsuridis V, Hasselmo M (2011) Spatial memory sequence encoding and replay during modeled theta and ripple oscillations. Cognit Comput 3(4):554–574

    Google Scholar 

  • Daw ND, Niv Y, Dayan P (2005) Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neurosci 8(12):1704

    CAS  PubMed  Google Scholar 

  • de Lavilléon G, Lacroix MM, Rondi-Reig L, Benchenane K (2015) Explicit memory creation during sleep demonstrates a causal role of place cells in navigation. Nat Neurosci 18(4):493–495

    PubMed  Google Scholar 

  • Diba K, Buzsáki G (2007) Forward and reverse hippocampal place-cell sequences during ripples. Nat Neurosci 10(10):1241

    CAS  PubMed  PubMed Central  Google Scholar 

  • Dollé L, Sheynikhovich D, Girard B, Chavarriaga R, Guillot A (2010) Path planning versus cue responding: a bio-inspired model of switching between navigation strategies. Biol Cybern 103(4):299–317

    PubMed  Google Scholar 

  • Dollé L, Chavarriaga R, Guillot A, Khamassi M (2018) Interactions of spatial strategies producing generalization gradient and blocking: a computational approach. PLoS Comput Biol 14(4):e1006092

    PubMed  PubMed Central  Google Scholar 

  • Dollé L, Khamassi M, Girard B, Guillot A, Chavarriaga R (2008) Analyzing interactions between navigation strategies using a computational model of action selection. In: International conference on spatial cognition. Springer, pp 71–86

  • Foster DJ (2017) Replay comes of age. Ann Rev Neurosci 40:581–602

    CAS  PubMed  Google Scholar 

  • Foster DJ, Ma Wilson (2006) Reverse replay of behavioural sequences in hippocampal place cells during the awake state. Nature 440(7084):680–683

    CAS  PubMed  Google Scholar 

  • Foster D, Morris R, Dayan P (2000) A model of hippocampally dependent navigation, using the temporal difference learning rule. Hippocampus 10(1):1–16

    CAS  PubMed  Google Scholar 

  • Frank MJ, Claus ED (2006) Anatomy of a decision: striato-orbitofrontal interactions in reinforcement learning, decision making, and reversal. Psychol Rev 113(2):300

    PubMed  Google Scholar 

  • Frankland PW, Bontempi B (2005) The organization of recent and remote memories. Nat Rev Neurosci 6(2):119–130

    CAS  PubMed  Google Scholar 

  • Girardeau G, Benchenane K, Wiener SI, Buzsáki G, Zugaro MB (2009) Selective suppression of hippocampal ripples impairs spatial memory. Nat Neurosci 12(10):1222–1223

    CAS  PubMed  Google Scholar 

  • Guazzelli A, Bota M, Corbacho FJ, Arbib MA (1998) Affordances. Motivations, and the world graph theory. Adapt Behav 6(3–4):435–471

    Google Scholar 

  • Gupta AS, van der Meer MAA, Touretzky DS, Redish AD (2010) Hippocampal replay is not a simple function of experience. Neuron 65(5):695–705

    CAS  PubMed  PubMed Central  Google Scholar 

  • Jadhav SP, Kemere C, German PW, Frank LM (2012) Awake hippocampal sharp-wave ripples support spatial memory. Science 336(6087):1454–1458

    CAS  PubMed  PubMed Central  Google Scholar 

  • Jahnke S, Timme M, Memmesheimer RM (2015) A unified dynamic model for learning, replay, and sharp-wave/ripples. J Neurosci 35(49):16236–16258

    CAS  PubMed  PubMed Central  Google Scholar 

  • Johnson A, Redish AD (2005) Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model. Neural Netw 18(9):1163–1171

    PubMed  Google Scholar 

  • Johnson A, Redish AD (2007) Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. J Neurosci 27(45):12176–12189

    CAS  PubMed  PubMed Central  Google Scholar 

  • Johnson A, van der Meer MA, Redish AD (2007) Integrating hippocampus and striatum in decision-making. Curr Opin Neurobiol 17(6):692–697

    CAS  PubMed  Google Scholar 

  • Jones JL, Esber GR, McDannald MA, Gruber AJ, Hernandez A, Mirenzi A, Schoenbaum G (2012) Orbitofrontal cortex supports behavior and learning using inferred but not cached values. Science 338(6109):953–956

    CAS  PubMed  PubMed Central  Google Scholar 

  • Karlsson MP, Frank LM (2009) Awake replay of remote experiences in the hippocampus. Nat Neurosci 12(7):913

    CAS  PubMed  PubMed Central  Google Scholar 

  • Khamassi M, Humphries MD (2012) Integrating cortico-limbic-basal ganglia architectures for learning model-based and model-free navigation strategies. Front Behav Neurosci 6:79

    PubMed  PubMed Central  Google Scholar 

  • Khamassi M, Quilodran R, Enel P, Dominey P, Procyk E (2015) Behavioral regulation and the modulation of information coding in the lateral prefrontal and cingulate cortex. Cereb Cortex 25(9):3197–3218

    PubMed  Google Scholar 

  • Klein-Flügge MC, Barron HC, Brodersen KH, Dolan RJ, Behrens TEJ (2013) Segregated encoding of reward-identity and stimulus-reward associations in human orbitofrontal cortex. J Neurosci 33(7):3202–3211

    PubMed  PubMed Central  Google Scholar 

  • Lansink CS, Goltstein PM, Lankelma JV, McNaughton BL, Pennartz CMA (2009) Hippocampus leads ventral striatum in replay of place-reward information. PLoS Biol 7(8):e1000173

    PubMed  PubMed Central  Google Scholar 

  • Lee AK, Wilson MA (2002) Memory of sequential experience in the hippocampus during slow wave sleep. Neuron 36(6):1183–1194

    CAS  PubMed  Google Scholar 

  • Levy WB (1996) A sequence predicting ca3 is a flexible associator that learns and uses context to solve hippocampal-like tasks. Hippocampus 6(6):579–590

    CAS  PubMed  Google Scholar 

  • Lin LJ (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3/4):69–97

    Google Scholar 

  • Maingret N, Girardeau G, Todorova R, Goutierre M, Zugaro M (2016) Hippocampo-cortical coupling mediates memory consolidation during sleep. Nat Neurosci 19(7):959–964

    CAS  PubMed  Google Scholar 

  • Mattar MG, Daw ND (2018) Prioritized memory access explains planning and hippocampal replay. Nat Neurosci 21(11):1609

    CAS  PubMed  PubMed Central  Google Scholar 

  • Miller EK, Cohen JD (2001) An integrative theory of prefrontal cortex function. Ann Rev Neurosci 24(1):167–202

    CAS  PubMed  Google Scholar 

  • Moore AW, Atkeson CG (1993) Prioritized sweeping: reinforcement learning with less data and less time. Mach Learn 13(1):103–130

    Google Scholar 

  • O’Keefe J, Dostrovsky J (1971) The hippocampus as a spatial map: preliminary evidence from unit activity in the freely-moving rat. Brain Res 34(1):171–175

    PubMed  Google Scholar 

  • Ólafsdóttir HF, Barry C, Saleem AB, Hassabis D, Spiers HJ (2015) Hippocampal place cells construct reward related sequences through unexplored space. eLife 4(JUNE):e06063

    PubMed  PubMed Central  Google Scholar 

  • Ólafsdóttir HF, Bush D, Barry C (2018) The role of hippocampal replay in memory and planning. Curr Biol 28(1):R37–R50

    PubMed  PubMed Central  Google Scholar 

  • Palminteri S, Lefebvre G, Kilford EJ, Blakemore SJ (2017) Confirmation bias in human reinforcement learning: evidence from counterfactual feedback processing. PLoS Comput Biol 13(8):e1005684

    PubMed  PubMed Central  Google Scholar 

  • Papale AE, Zielinski MC, Frank LM, Jadhav SP, Redish AD (2016) Interplay between hippocampal sharp-wave-ripple events and vicarious trial and error behaviors in decision making. Neuron 92(5):1–8

    Google Scholar 

  • Park SA, Miller DS, Nili H, Ranganath C, Boorman ED (2019) Map making: constructing, combining, and navigating abstract cognitive maps. BioRxiv p 810051

  • Pasupathy A, Miller EK (2005) Different time courses of learning-related activity in the prefrontal cortex and striatum. Nature 433(7028):873

    CAS  PubMed  Google Scholar 

  • Peng J, Williams RJ (1993) Efficient learning and planning within the Dyna framework. Adapt Behav 1(4):437–454

    Google Scholar 

  • Peyrache A, Khamassi M, Benchenane K, Wiener SI, Battaglia FP (2009) Replay of rule-learning related neural patterns in the prefrontal cortex during sleep. Nat Neurosci 12(7):919–926

    CAS  PubMed  Google Scholar 

  • Pezzulo G, Rigoli F, Chersi F (2013) The mixed instrumental controller: using value of information to combine habitual choice and mental simulation. Front Psychol 4:212

    Google Scholar 

  • Pezzulo G, van der Meer MAA, Lansink CS, Pennartz CMA (2014) Internally generated sequences in learning and executing goal-directed behavior. Trends Cognit Sci 18(12):647–657

    Google Scholar 

  • Pezzulo G, Kemere C, Van Der Meer MA (2017) Internally generated hippocampal sequences as a vantage point to probe future-oriented cognition. Ann N Y Acad Sci 1396(1):144–165

    PubMed  Google Scholar 

  • Pfeiffer BE, Foster DJ (2013) Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497(7447):74

    CAS  PubMed  PubMed Central  Google Scholar 

  • Pohl I (1971) Bi-directional search. Mach Intell 6(127–140):10

    Google Scholar 

  • Redish AD (2016) Vicarious trial and error. Nat Rev Neurosci 17(3):147–159

    CAS  PubMed  PubMed Central  Google Scholar 

  • Renaudo E, Girard B, Chatila R, Khamassi M (2014) Design of a control architecture for habit learning in robots. In: Conference on biomimetic and biohybrid systems. Springer, pp 249–260

  • Rennó-Costa C, da Silva ACC, Blanco W, Ribeiro S (2019) Computational models of memory consolidation and long-term synaptic plasticity during sleep. Neurobiol Learn Mem 160:32–47

    PubMed  Google Scholar 

  • Roumis DK, Frank LM (2015) Hippocampal sharp-wave ripples in waking and sleeping states. Curr Opin Neurobiol 35:6–12

    CAS  PubMed  PubMed Central  Google Scholar 

  • Saravanan V, Arabali D, Jochems A, Cui AX, Gootjes-Dreesbach L, Cutsuridis V, Yoshida M (2015) Transition between encoding and consolidation/replay dynamics via cholinergic modulation of can current: a modeling study. Hippocampus 25(9):1052–1070

    CAS  PubMed  Google Scholar 

  • Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275:1593–1599

    CAS  PubMed  Google Scholar 

  • Stachenfeld KL, Botvinick MM, Gershman SJ (2017) The hippocampus as a predictive map. Nat Neurosci 20(11):1643

    CAS  PubMed  Google Scholar 

  • Sutton RS (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the seventh international conference on machine learning, pp 216–224

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

    Google Scholar 

  • van der Meer M, Kurth-Nelson Z, Redish AD (2012) Information processing in decision-making systems. Neuroscientist 18(4):342–359

    PubMed  PubMed Central  Google Scholar 

  • Viejo G, Khamassi M, Brovelli A, Girard B (2015) Modeling choice and reaction time during arbitrary visuomotor learning through the coordination of adaptive working memory and reinforcement learning. Front Behav Neurosci 9:225

    PubMed  PubMed Central  Google Scholar 

  • Wikenheiser AM, Schoenbaum G (2016) Over the river, through the woods: cognitive maps in the hippocampus and orbitofrontal cortex. Nat Rev Neurosci 17(8):513–523

    CAS  PubMed  PubMed Central  Google Scholar 

  • Wilson MA, McNaughton BL (1994) Reactivation of hippocampal ensemble memories during sleep. Science (New York, NY) 265(5172):676–679

    CAS  Google Scholar 

  • Zhou J, Montesinos-Cartagena M, Wikenheiser AM, Gardner MP, Niv Y, Schoenbaum G (2019) Complementary task structure representations in hippocampus and orbitofrontal cortex during an odor sequence task. Curr Biol 29(20):3402–3409

    CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mehdi Khamassi.

Additional information

Communicated by Jean-Marc Fellous.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to a Special Issue on Complex Spatial Navigation in Animals, Computational Models and Neuro-inspired Robots.

This work has received funding from the European Union’s Horizon 2020 Research and Innovation Program under Grant Agreement No. 640891 (DREAM Project), and from the CNRS 80|PRIME Research Program (RHiPAR Project). This work was performed within the Labex SMART (ANR-11-LABX-65) supported by French state funds managed by the ANR within the Investissements d’Avenir programme under reference ANR-11-IDEX-0004-02.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khamassi, M., Girard, B. Modeling awake hippocampal reactivations with model-based bidirectional search. Biol Cybern 114, 231–248 (2020). https://doi.org/10.1007/s00422-020-00817-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00422-020-00817-x

Keywords

Navigation