Abstract
Hippocampal offline reactivations during reward-based learning, usually categorized as replay events, have been found to be important for performance improvement over time and for memory consolidation. Recent computational work has linked these phenomena to the need to transform reward information into state-action values for decision making and to propagate it to all relevant states of the environment. Nevertheless, it is still unclear whether an integrated reinforcement learning mechanism could account for the variety of awake hippocampal reactivations, including variety in order (forward and reverse reactivated trajectories) and variety in the location where they occur (reward site or decision-point). Here, we present a model-based bidirectional search model which accounts for a variety of hippocampal reactivations. The model combines forward trajectory sampling from current position and backward sampling through prioritized sweeping from states associated with large reward prediction errors until the two trajectories connect. This is repeated until stabilization of state-action values (convergence), which could explain why hippocampal reactivations drastically diminish when the animal’s performance stabilizes. Simulations in a multiple T-maze task show that forward reactivations are prominently found at decision-points while backward reactivations are exclusively generated at reward sites. Finally, the model can generate imaginary trajectories that are not allowed to the agent during task performance. We raise some experimental predictions and implications for future studies of the role of the hippocampo–prefronto–striatal network in learning.
Similar content being viewed by others
Notes
The code is available at https://github.com/MehdiKhamassi/RLwithReplay.
References
Arleo A, Gerstner W (2000) Spatial cognition and neuro-mimetic navigation: a model of hippocampal place cell activity. Biol Cybern 83(3):287–299
Aubin L, Khamassi M, Girard B (2018) Prioritized sweeping neural DynaQ with multiple predecessors, and hippocampal replays. In: Conference on biomimetic and biohybrid systems. Springer, pp 16–27
Barto AG (1995) Adaptive critics and the basal ganglia. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the Basal Ganglia. The MIT Press, Cambridge, pp 215–232
Barto AG, Bradtke SJ, Singh SP (1995) Learning to act using real-time dynamic programming. Arti Intell 72(1–2):81–138
Battaglia FP, Peyrache A, Khamassi M, Wiener SI et al (2008) Spatial decisions and neuronal activity in hippocampal projection zones in prefrontal cortex and striatum. Hippocampal place fields. Relevance Learn Mem 115:289–311
Benchenane K, Peyrache A, Khamassi M, Tierney PL, Gioanni Y, Battaglia FP, Wiener SI (2010) Coherent theta oscillations and reorganization of spike timing in the hippocampal-prefrontal network upon learning. Neuron 66(6):921–936
Bhalla US (2019) Dendrites, deep learning, and sequences in the hippocampus. Hippocampus 29(3):239–251
Buzsáki G (1989) Two-stage model of memory trace formation: a role for “noisy” brain states. Neuroscience 31(3):551–570
Caluwaerts K, Staffa M, N’Guyen S, Grand C, Dollé L, Favre-Félix A, Girard B, Khamassi M (2012) A biologically inspired meta-control navigation system for the psikharpax rat robot. Bioinspir Biomim 7(2):025009
Cazé R, Khamassi M, Aubin L, Girard B (2018) Hippocampal replays under the scrutiny of reinforcement learning models. J Neurophysiol 120(6):2877–2896
Cisek P, Puskas GA, El-Murr S (2009) Decisions in changing conditions: the urgency-gating model. J Neurosci 29(37):11560–11571
Cutsuridis V, Hasselmo M (2011) Spatial memory sequence encoding and replay during modeled theta and ripple oscillations. Cognit Comput 3(4):554–574
Daw ND, Niv Y, Dayan P (2005) Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neurosci 8(12):1704
de Lavilléon G, Lacroix MM, Rondi-Reig L, Benchenane K (2015) Explicit memory creation during sleep demonstrates a causal role of place cells in navigation. Nat Neurosci 18(4):493–495
Diba K, Buzsáki G (2007) Forward and reverse hippocampal place-cell sequences during ripples. Nat Neurosci 10(10):1241
Dollé L, Sheynikhovich D, Girard B, Chavarriaga R, Guillot A (2010) Path planning versus cue responding: a bio-inspired model of switching between navigation strategies. Biol Cybern 103(4):299–317
Dollé L, Chavarriaga R, Guillot A, Khamassi M (2018) Interactions of spatial strategies producing generalization gradient and blocking: a computational approach. PLoS Comput Biol 14(4):e1006092
Dollé L, Khamassi M, Girard B, Guillot A, Chavarriaga R (2008) Analyzing interactions between navigation strategies using a computational model of action selection. In: International conference on spatial cognition. Springer, pp 71–86
Foster DJ (2017) Replay comes of age. Ann Rev Neurosci 40:581–602
Foster DJ, Ma Wilson (2006) Reverse replay of behavioural sequences in hippocampal place cells during the awake state. Nature 440(7084):680–683
Foster D, Morris R, Dayan P (2000) A model of hippocampally dependent navigation, using the temporal difference learning rule. Hippocampus 10(1):1–16
Frank MJ, Claus ED (2006) Anatomy of a decision: striato-orbitofrontal interactions in reinforcement learning, decision making, and reversal. Psychol Rev 113(2):300
Frankland PW, Bontempi B (2005) The organization of recent and remote memories. Nat Rev Neurosci 6(2):119–130
Girardeau G, Benchenane K, Wiener SI, Buzsáki G, Zugaro MB (2009) Selective suppression of hippocampal ripples impairs spatial memory. Nat Neurosci 12(10):1222–1223
Guazzelli A, Bota M, Corbacho FJ, Arbib MA (1998) Affordances. Motivations, and the world graph theory. Adapt Behav 6(3–4):435–471
Gupta AS, van der Meer MAA, Touretzky DS, Redish AD (2010) Hippocampal replay is not a simple function of experience. Neuron 65(5):695–705
Jadhav SP, Kemere C, German PW, Frank LM (2012) Awake hippocampal sharp-wave ripples support spatial memory. Science 336(6087):1454–1458
Jahnke S, Timme M, Memmesheimer RM (2015) A unified dynamic model for learning, replay, and sharp-wave/ripples. J Neurosci 35(49):16236–16258
Johnson A, Redish AD (2005) Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model. Neural Netw 18(9):1163–1171
Johnson A, Redish AD (2007) Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. J Neurosci 27(45):12176–12189
Johnson A, van der Meer MA, Redish AD (2007) Integrating hippocampus and striatum in decision-making. Curr Opin Neurobiol 17(6):692–697
Jones JL, Esber GR, McDannald MA, Gruber AJ, Hernandez A, Mirenzi A, Schoenbaum G (2012) Orbitofrontal cortex supports behavior and learning using inferred but not cached values. Science 338(6109):953–956
Karlsson MP, Frank LM (2009) Awake replay of remote experiences in the hippocampus. Nat Neurosci 12(7):913
Khamassi M, Humphries MD (2012) Integrating cortico-limbic-basal ganglia architectures for learning model-based and model-free navigation strategies. Front Behav Neurosci 6:79
Khamassi M, Quilodran R, Enel P, Dominey P, Procyk E (2015) Behavioral regulation and the modulation of information coding in the lateral prefrontal and cingulate cortex. Cereb Cortex 25(9):3197–3218
Klein-Flügge MC, Barron HC, Brodersen KH, Dolan RJ, Behrens TEJ (2013) Segregated encoding of reward-identity and stimulus-reward associations in human orbitofrontal cortex. J Neurosci 33(7):3202–3211
Lansink CS, Goltstein PM, Lankelma JV, McNaughton BL, Pennartz CMA (2009) Hippocampus leads ventral striatum in replay of place-reward information. PLoS Biol 7(8):e1000173
Lee AK, Wilson MA (2002) Memory of sequential experience in the hippocampus during slow wave sleep. Neuron 36(6):1183–1194
Levy WB (1996) A sequence predicting ca3 is a flexible associator that learns and uses context to solve hippocampal-like tasks. Hippocampus 6(6):579–590
Lin LJ (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3/4):69–97
Maingret N, Girardeau G, Todorova R, Goutierre M, Zugaro M (2016) Hippocampo-cortical coupling mediates memory consolidation during sleep. Nat Neurosci 19(7):959–964
Mattar MG, Daw ND (2018) Prioritized memory access explains planning and hippocampal replay. Nat Neurosci 21(11):1609
Miller EK, Cohen JD (2001) An integrative theory of prefrontal cortex function. Ann Rev Neurosci 24(1):167–202
Moore AW, Atkeson CG (1993) Prioritized sweeping: reinforcement learning with less data and less time. Mach Learn 13(1):103–130
O’Keefe J, Dostrovsky J (1971) The hippocampus as a spatial map: preliminary evidence from unit activity in the freely-moving rat. Brain Res 34(1):171–175
Ólafsdóttir HF, Barry C, Saleem AB, Hassabis D, Spiers HJ (2015) Hippocampal place cells construct reward related sequences through unexplored space. eLife 4(JUNE):e06063
Ólafsdóttir HF, Bush D, Barry C (2018) The role of hippocampal replay in memory and planning. Curr Biol 28(1):R37–R50
Palminteri S, Lefebvre G, Kilford EJ, Blakemore SJ (2017) Confirmation bias in human reinforcement learning: evidence from counterfactual feedback processing. PLoS Comput Biol 13(8):e1005684
Papale AE, Zielinski MC, Frank LM, Jadhav SP, Redish AD (2016) Interplay between hippocampal sharp-wave-ripple events and vicarious trial and error behaviors in decision making. Neuron 92(5):1–8
Park SA, Miller DS, Nili H, Ranganath C, Boorman ED (2019) Map making: constructing, combining, and navigating abstract cognitive maps. BioRxiv p 810051
Pasupathy A, Miller EK (2005) Different time courses of learning-related activity in the prefrontal cortex and striatum. Nature 433(7028):873
Peng J, Williams RJ (1993) Efficient learning and planning within the Dyna framework. Adapt Behav 1(4):437–454
Peyrache A, Khamassi M, Benchenane K, Wiener SI, Battaglia FP (2009) Replay of rule-learning related neural patterns in the prefrontal cortex during sleep. Nat Neurosci 12(7):919–926
Pezzulo G, Rigoli F, Chersi F (2013) The mixed instrumental controller: using value of information to combine habitual choice and mental simulation. Front Psychol 4:212
Pezzulo G, van der Meer MAA, Lansink CS, Pennartz CMA (2014) Internally generated sequences in learning and executing goal-directed behavior. Trends Cognit Sci 18(12):647–657
Pezzulo G, Kemere C, Van Der Meer MA (2017) Internally generated hippocampal sequences as a vantage point to probe future-oriented cognition. Ann N Y Acad Sci 1396(1):144–165
Pfeiffer BE, Foster DJ (2013) Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497(7447):74
Pohl I (1971) Bi-directional search. Mach Intell 6(127–140):10
Redish AD (2016) Vicarious trial and error. Nat Rev Neurosci 17(3):147–159
Renaudo E, Girard B, Chatila R, Khamassi M (2014) Design of a control architecture for habit learning in robots. In: Conference on biomimetic and biohybrid systems. Springer, pp 249–260
Rennó-Costa C, da Silva ACC, Blanco W, Ribeiro S (2019) Computational models of memory consolidation and long-term synaptic plasticity during sleep. Neurobiol Learn Mem 160:32–47
Roumis DK, Frank LM (2015) Hippocampal sharp-wave ripples in waking and sleeping states. Curr Opin Neurobiol 35:6–12
Saravanan V, Arabali D, Jochems A, Cui AX, Gootjes-Dreesbach L, Cutsuridis V, Yoshida M (2015) Transition between encoding and consolidation/replay dynamics via cholinergic modulation of can current: a modeling study. Hippocampus 25(9):1052–1070
Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275:1593–1599
Stachenfeld KL, Botvinick MM, Gershman SJ (2017) The hippocampus as a predictive map. Nat Neurosci 20(11):1643
Sutton RS (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the seventh international conference on machine learning, pp 216–224
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
van der Meer M, Kurth-Nelson Z, Redish AD (2012) Information processing in decision-making systems. Neuroscientist 18(4):342–359
Viejo G, Khamassi M, Brovelli A, Girard B (2015) Modeling choice and reaction time during arbitrary visuomotor learning through the coordination of adaptive working memory and reinforcement learning. Front Behav Neurosci 9:225
Wikenheiser AM, Schoenbaum G (2016) Over the river, through the woods: cognitive maps in the hippocampus and orbitofrontal cortex. Nat Rev Neurosci 17(8):513–523
Wilson MA, McNaughton BL (1994) Reactivation of hippocampal ensemble memories during sleep. Science (New York, NY) 265(5172):676–679
Zhou J, Montesinos-Cartagena M, Wikenheiser AM, Gardner MP, Niv Y, Schoenbaum G (2019) Complementary task structure representations in hippocampus and orbitofrontal cortex during an odor sequence task. Curr Biol 29(20):3402–3409
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Jean-Marc Fellous.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to a Special Issue on Complex Spatial Navigation in Animals, Computational Models and Neuro-inspired Robots.
This work has received funding from the European Union’s Horizon 2020 Research and Innovation Program under Grant Agreement No. 640891 (DREAM Project), and from the CNRS 80|PRIME Research Program (RHiPAR Project). This work was performed within the Labex SMART (ANR-11-LABX-65) supported by French state funds managed by the ANR within the Investissements d’Avenir programme under reference ANR-11-IDEX-0004-02.
Rights and permissions
About this article
Cite this article
Khamassi, M., Girard, B. Modeling awake hippocampal reactivations with model-based bidirectional search. Biol Cybern 114, 231–248 (2020). https://doi.org/10.1007/s00422-020-00817-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00422-020-00817-x