Elsevier

Ecological Complexity

Volume 42, March 2020, 100815
Ecological Complexity

A reinforcement learning-based predator-prey model

https://doi.org/10.1016/j.ecocom.2020.100815Get rights and content

Highlights

  • We use the reinforcement learning algorithms to endow the organism with learning ability, and simulate their evolution process by using the Monte Carlo simulation algorithm in a large-scale ecosystem.

  • The combination of the two algorithms allows organisms to use experiences to determine their behavior through interaction with that environment, and to pass on experience to their offspring.

  • Our results show that the reinforcement learning of predators is beneficial to the stability of the ecosystem, prey’s learning makes the ecosystem oscillate and meanwhile leads to a higher risk of extinction for predators.

Abstract

Classic population models can often predict the dynamics of biological populations in nature. However, the adaptation process and learning mechanism of species are rarely considered in the study of population dynamics, due to the complex interaction of species, seasonal variation, spatial distribution or other factors. We use reinforcement learning algorithms to improve the existing individual-based ecosystem simulation algorithms, which allows species to spontaneously adjust their strategies according to a short period of experience and then feed back to improve their abilities to make action decisions. Our results show that the reinforcement learning of predators is beneficial to the stability of the ecosystem, and predators can learn to spontaneously form hunting patterns that surround their prey. The learning of prey makes the ecosystem oscillate and meanwhile leads to a higher risk of extinction for predators. When individuals are more likely to die, these herbivores rely on reproductive behavior to maintain their populations; when individuals live longer, herbivores spend more time eating to maintain their own survival. The co-reinforcement learning of predators and prey helps predators to find a more suitable way to survive with their prey, that is, the number of predators is more stable and larger than when only predator or only prey learns.

Introduction

What determines the densities of the different species such as predators, prey or plants, why do their numbers fluctuate and extinctions occur, and how do different species interact to determine each other’s abundance? (Bonsall, Hassell, 2000, May, McLean, 2007) These questions are addressed by the ecological population dynamics. The most classical population dynamics model is Lotka–Volterra (LV) model, which originally describes the population dynamics of fish in the Adriatic Sea (Lotka, 1920, Volterra, 1928). The Lotka–Volterra model shows that predator-prey interactions have an inherent tendency to fluctuate and show oscillatory behavior (Hofbauer, Sigmund, 1998, May, Leonard, 1975, Rosenzweig, MacArthur, 1963). With the advent of computational power, numerous sophisticated models have been proposed and well investigated which complement the classical Lotka–Volterra model and allow for more realistic description of species interactions. Individual-based Monte Carlo simulation algorithm is one important method to study the spatio-temporal characteristics of ecosystems in population dynamics. In the simulation, the mobility of individuals, the chasing and escaping behaviors, the spatial distribution of population and other factors can be shown immediately (Anderson, Dragićević, 2018, Banerjee, Petrovskii, 2011, Carneiro, Charret, 2007, Droz, PeKalski, 2001, Droz, Pekalski, 2012, He, Tuber, Zia, 2012, Ni, Wang, Lai, Grebogi, 2010, Pekalski, Andrzej, Szwabinski, Janusz, 2013, Perc, Szolnoki, 2010, Reichenbach, Mobilia, Frey, 2008, Roman, Konrad, Pleimling, 2012, Szwabinski, Pekalski, Bena, Droz, 2010, Yokoyama, Noguchi, Nagano, 2008). The spatially extended simulation algorithm improves the localization clustering of species, which results in enhancing both competing species population densities, avoids complete population extinction, and produces long-term unstable population oscillations or stable coexistence of species (Kang, Pan, Wang, He, 2013, Kang, Pan, Wang, He, 2016, Mohammed, Landi, Minoarivelo, Hui, 2018, Molina, Moreno-Armendriz, Carlos, 2013, Wang, He, Kang, 2012, Wang, Pan, Kang, He, 2016).

Individuals are also exposed to a host of adaptive problems by natural selection (Frankenhuis, Panchanathan, Barto, 2018, Hui, Landi, Minoarivelo, Ramanantoanina, 2018), which mainly occur across two scales: one is the generational changes of genes, developmental systems or phenotypic traits on a long time scale; the other is changes within an individual from conception to death. Adaptation has been modelled in many ways on the evolutionary timescale, the most closely related with population dynamics is Adaptive Dynamics (Dercole, Rinaldi, 2008, Geritz, Kisdi, Meszena, Metz, 1998, Hui, Landi, Minoarivelo, Ramanantoanina, 2018), with which prey-predator interactions have been studied (Landi, Dercole, Rinaldi, 2013, Marrow, Dieckmann, Law, 1996), and also food chains (Brännström, Loeuille, Loreau, Dieckmann, 2011, Dercole, Ferriere, Rinaldi, 2010, Hui, Minoarivelo, Landi, 2018), recently extended to spatial populations (Wickman et al., 2017); or other approaches (Wilsenach et al., 2017). On the other hand, individual learning is usually related with behavioral adaptation on the individual lifetime timescale (Beckerman, Petchey, Morin, 2010, Heckmann, Drossel, Brose, Guill, 2012, Kondoh, 2003, Zhang, Hui, 2014). Those adaptations are the key to the long-term stability of complex communities, but also the core of our understanding of how species assemble and function in ecosystems (Nuwagaba, Zhang, Hui, 2015, Nuwagaba, Zhang, Hui, 2017). On the whole, individuals need to learn and evolve to solve the adaptive problems that they face in nature, which leads to the change of behaviors for an individual from conception to death, as well as the change of behaviors across generations in a population. For example, stickleback fish have evolved learning mechanisms that allow them to process food more efficiently (Frankenhuis et al., 2018). Since the real adaptation process is difficult to quantify and model, it is still necessary to construct more realistic and intelligent individual-based ecosystems (Gobeyn et al., 2019). Reinforcement learning is a feasible method to study the adaptive process of individuals in artificial intelligence algorithms. In the reinforcement learning algorithm, individuals can learn and adjust their experiences inherited from their parent by trial and error. However, most of the successful examples of reinforcement learning have been in single agent domains, so successfully extending reinforcement learning to an environment with multiple species is critical for building a more intelligent ecosystem (Hou, Ong, Feng, Zurada, 2017, Li, Lillicrap, Hunt, Pritzel, Heess, Erez, Tassa, Silver, Wierstra, 2015, Mnih, Kavukcuoglu, Silver, Graves, Antonoglou, Wierstra, Riedmiller, 2013, Mnih, Kavukcuoglu, Silver, Rusu, Veness, Bellemare, Graves, Riedmiller, Fidjeland, Ostrovski, 2015, O’Donoghue, Munos, Kavukcuoglu, Mnih).

In this paper, we propose a framework to increase reinforcement learning abilities of individuals in the ecosystem simulation using the Monte Carlo method. The actions of individuals include moving (chasing or escaping), hunting and feeding, breeding and so on. The training is using Q-learning algorithm with ϵgreedy and we use experience replay method to restore the nearly population memories. Section 2 begins to introduce the model of Monte Carlo simulation, Q-learning Method and algorithm with experience replay. Section 3 shows the results of the ecosystem simulation. Finally, we conclude this paper in Section 4.

Section snippets

Monte Carlo simulation

We consider a spatial scenario in which individuals are placed on a square lattice of linear size L with periodic boundary conditions (Wang et al., 2012). Three types of individuals can be occur in our model: predators X, prey Y and plants Z. Predators eat prey who feed on plants. One species can only appear once in the same site at a given time, such that the state of one site will be one of the following eight states: s1empty, s2one unit of plants, s3a prey, s4one unit of plants and a

Results

Full exploration of the parameter space is possible only if there are not too many parameters. As in nearly all theoretical models, we do not try to reproduce faithfully a given ecosystem. Our previous researches show more about the impacts of parameters on the ecosystem (Wang, He, Kang, 2012, Wang, Pan, Kang, He, 2016). Let the set of basic parameters be (L,phX,pbX,pbY,fX,fY)=(200,0.8,0.1,0.8,10,5). All simulations were run with initial random spatial distributions. The results reported below

Conclusion

We have considered a reinforcement-learning based predator-prey model using Monte Carlo simulation. Unlike previous adaptive models (Beckerman, Petchey, Morin, 2010, Heckmann, Drossel, Brose, Guill, 2012, Kondoh, 2003), our model takes into account the learning-based behavioral evolution of predators and prey under a two-dimensional environment. By using the reinforcement-learning algorithm, predators and prey learn to choose actions and update policies based on their current and expected

CRediT authorship contribution statement

Xueting Wang: Investigation, Writing - review & editing. Jun Cheng: Investigation, Writing - review & editing. Lei Wang: Writing - review & editing.

Declaration of Competing Interest

The authors declared that they have no conflicts of interest to this work. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Acknowledgments

We gratefully acknowledge the kind anonymous referees for their helpful comments. This work was supported by National Natural Science Foundation of China (U1713213, 61772508, U1813205), Key Research and Development Program of Guangdong Province [grant numbers 2019B090915001], Shenzhen Technology Project (JCYJ20170413152535587, JSGG20170823091924128, JCYJ20170307164023599, JCYJ20180507182610734), and CAS Key Technology Talent Program.

References (53)

  • X. Wang et al.

    A computational predator prey model, pursuitevasion behavior based on different range of vision

    Physica A Stat. Mech.Appl.

    (2012)
  • X. Wang et al.

    Predator group size distributions in predator prey systems

    Ecol. Complex.

    (2016)
  • F. Zhang et al.

    Recent experience-driven behaviour optimizes foraging

    Anim. Behav.

    (2014)
  • M. Banerjee et al.

    Self-organised spatial patterns and chaos in a ratio-dependent predatorprey system

    Theor. Ecol.

    (2011)
  • A. Beckerman et al.

    Adaptive foragers and community ecology: linking individuals to communities and ecosystems

    Funct. Ecol.

    (2010)
  • M.B. Bonsall et al.

    The effects of metapopulation structure on indirect interactions in host-parasitoid assemblages.

    Proc. Biol. Sci.

    (2000)
  • Å. Brännström et al.

    Emergence and maintenance of biodiversity in an evolutionary food-web model

    Theor. Ecol.

    (2011)
  • M.V. Carneiro et al.

    Spontaneous emergence of spatial patterns in a predator-prey model.

    Phys. Rev. E

    (2007)
  • Chen, Y., Wang, M., 2016. Stochastic primal-dual methods and sample complexity of reinforcement learning. ArXiv...
  • F. Dercole et al.

    Chaotic red queen coevolution in three-species food chains

    Proc. R. Soc. B

    (2010)
  • F. Dercole et al.

    Analysis of evolutionary processes: the adaptive dynamics approach and its applications

    (2008)
  • M. Droz et al.

    Coexistence in a predator-prey system.

    Phys. Rev. E

    (2001)
  • W.E. Frankenhuis et al.

    Enriching behavioral ecology with reinforcement learning methods

    Behav. Process.

    (2018)
  • S. Geritz et al.

    Evolutionarily singular strategies and the adaptive growth and branching of the evolutionary tree

    Evol. Ecol.

    (1998)
  • Q. He et al.

    On the relationship between cyclic and hierarchical three-species predator-prey systems and the two-species Lotka-Volterra model

    Eur. Phys. J. B

    (2012)
  • L. Heckmann et al.

    Interactive effects of body-size structure and adaptive foraging on food-web stability

    Ecol. Lett.

    (2012)
  • Cited by (10)

    • Modeling collective motion for fish schooling via multi-agent reinforcement learning

      2023, Ecological Modelling
      Citation Excerpt :

      The increased likelihood of survival in the presence of predators is recognized as one of the benefits of collective motion for animals. Hence, some researchers proposed predator-prey models (Morihiro et al., 2008; Hahn et al., 2019; Sunehag et al., 2019; Wang et al., 2020) in which a predator and prey were placed in an environment where the predator attempted to catch the prey. At the end of each timestep, prey are rewarded to encourage policies that allow them to survive as long as possible.

    • Deep reinforcement learning for conservation decisions

      2022, Methods in Ecology and Evolution
    View all citing articles on Scopus
    View full text