A reinforcement learning-based predator-prey model

doi:10.1016/j.ecocom.2020.100815

Ecological Complexity

Volume 42, March 2020, 100815

https://doi.org/10.1016/j.ecocom.2020.100815 Get rights and content

Highlights

•
We use the reinforcement learning algorithms to endow the organism with learning ability, and simulate their evolution process by using the Monte Carlo simulation algorithm in a large-scale ecosystem.
•
The combination of the two algorithms allows organisms to use experiences to determine their behavior through interaction with that environment, and to pass on experience to their offspring.
•
Our results show that the reinforcement learning of predators is beneficial to the stability of the ecosystem, prey’s learning makes the ecosystem oscillate and meanwhile leads to a higher risk of extinction for predators.

Abstract

Classic population models can often predict the dynamics of biological populations in nature. However, the adaptation process and learning mechanism of species are rarely considered in the study of population dynamics, due to the complex interaction of species, seasonal variation, spatial distribution or other factors. We use reinforcement learning algorithms to improve the existing individual-based ecosystem simulation algorithms, which allows species to spontaneously adjust their strategies according to a short period of experience and then feed back to improve their abilities to make action decisions. Our results show that the reinforcement learning of predators is beneficial to the stability of the ecosystem, and predators can learn to spontaneously form hunting patterns that surround their prey. The learning of prey makes the ecosystem oscillate and meanwhile leads to a higher risk of extinction for predators. When individuals are more likely to die, these herbivores rely on reproductive behavior to maintain their populations; when individuals live longer, herbivores spend more time eating to maintain their own survival. The co-reinforcement learning of predators and prey helps predators to find a more suitable way to survive with their prey, that is, the number of predators is more stable and larger than when only predator or only prey learns.

Introduction

What determines the densities of the different species such as predators, prey or plants, why do their numbers fluctuate and extinctions occur, and how do different species interact to determine each other’s abundance? (Bonsall, Hassell, 2000, May, McLean, 2007) These questions are addressed by the ecological population dynamics. The most classical population dynamics model is Lotka–Volterra (LV) model, which originally describes the population dynamics of fish in the Adriatic Sea (Lotka, 1920, Volterra, 1928). The Lotka–Volterra model shows that predator-prey interactions have an inherent tendency to fluctuate and show oscillatory behavior (Hofbauer, Sigmund, 1998, May, Leonard, 1975, Rosenzweig, MacArthur, 1963). With the advent of computational power, numerous sophisticated models have been proposed and well investigated which complement the classical Lotka–Volterra model and allow for more realistic description of species interactions. Individual-based Monte Carlo simulation algorithm is one important method to study the spatio-temporal characteristics of ecosystems in population dynamics. In the simulation, the mobility of individuals, the chasing and escaping behaviors, the spatial distribution of population and other factors can be shown immediately (Anderson, Dragićević, 2018, Banerjee, Petrovskii, 2011, Carneiro, Charret, 2007, Droz, PeKalski, 2001, Droz, Pekalski, 2012, He, Tuber, Zia, 2012, Ni, Wang, Lai, Grebogi, 2010, Pekalski, Andrzej, Szwabinski, Janusz, 2013, Perc, Szolnoki, 2010, Reichenbach, Mobilia, Frey, 2008, Roman, Konrad, Pleimling, 2012, Szwabinski, Pekalski, Bena, Droz, 2010, Yokoyama, Noguchi, Nagano, 2008). The spatially extended simulation algorithm improves the localization clustering of species, which results in enhancing both competing species population densities, avoids complete population extinction, and produces long-term unstable population oscillations or stable coexistence of species (Kang, Pan, Wang, He, 2013, Kang, Pan, Wang, He, 2016, Mohammed, Landi, Minoarivelo, Hui, 2018, Molina, Moreno-Armendriz, Carlos, 2013, Wang, He, Kang, 2012, Wang, Pan, Kang, He, 2016).

Individuals are also exposed to a host of adaptive problems by natural selection (Frankenhuis, Panchanathan, Barto, 2018, Hui, Landi, Minoarivelo, Ramanantoanina, 2018), which mainly occur across two scales: one is the generational changes of genes, developmental systems or phenotypic traits on a long time scale; the other is changes within an individual from conception to death. Adaptation has been modelled in many ways on the evolutionary timescale, the most closely related with population dynamics is Adaptive Dynamics (Dercole, Rinaldi, 2008, Geritz, Kisdi, Meszena, Metz, 1998, Hui, Landi, Minoarivelo, Ramanantoanina, 2018), with which prey-predator interactions have been studied (Landi, Dercole, Rinaldi, 2013, Marrow, Dieckmann, Law, 1996), and also food chains (Brännström, Loeuille, Loreau, Dieckmann, 2011, Dercole, Ferriere, Rinaldi, 2010, Hui, Minoarivelo, Landi, 2018), recently extended to spatial populations (Wickman et al., 2017); or other approaches (Wilsenach et al., 2017). On the other hand, individual learning is usually related with behavioral adaptation on the individual lifetime timescale (Beckerman, Petchey, Morin, 2010, Heckmann, Drossel, Brose, Guill, 2012, Kondoh, 2003, Zhang, Hui, 2014). Those adaptations are the key to the long-term stability of complex communities, but also the core of our understanding of how species assemble and function in ecosystems (Nuwagaba, Zhang, Hui, 2015, Nuwagaba, Zhang, Hui, 2017). On the whole, individuals need to learn and evolve to solve the adaptive problems that they face in nature, which leads to the change of behaviors for an individual from conception to death, as well as the change of behaviors across generations in a population. For example, stickleback fish have evolved learning mechanisms that allow them to process food more efficiently (Frankenhuis et al., 2018). Since the real adaptation process is difficult to quantify and model, it is still necessary to construct more realistic and intelligent individual-based ecosystems (Gobeyn et al., 2019). Reinforcement learning is a feasible method to study the adaptive process of individuals in artificial intelligence algorithms. In the reinforcement learning algorithm, individuals can learn and adjust their experiences inherited from their parent by trial and error. However, most of the successful examples of reinforcement learning have been in single agent domains, so successfully extending reinforcement learning to an environment with multiple species is critical for building a more intelligent ecosystem (Hou, Ong, Feng, Zurada, 2017, Li, Lillicrap, Hunt, Pritzel, Heess, Erez, Tassa, Silver, Wierstra, 2015, Mnih, Kavukcuoglu, Silver, Graves, Antonoglou, Wierstra, Riedmiller, 2013, Mnih, Kavukcuoglu, Silver, Rusu, Veness, Bellemare, Graves, Riedmiller, Fidjeland, Ostrovski, 2015, O’Donoghue, Munos, Kavukcuoglu, Mnih).

In this paper, we propose a framework to increase reinforcement learning abilities of individuals in the ecosystem simulation using the Monte Carlo method. The actions of individuals include moving (chasing or escaping), hunting and feeding, breeding and so on. The training is using Q-learning algorithm with $ϵ - g r e e d y$ and we use experience replay method to restore the nearly population memories. Section 2 begins to introduce the model of Monte Carlo simulation, Q-learning Method and algorithm with experience replay. Section 3 shows the results of the ecosystem simulation. Finally, we conclude this paper in Section 4.

Section snippets

Monte Carlo simulation

We consider a spatial scenario in which individuals are placed on a square lattice of linear size L with periodic boundary conditions (Wang et al., 2012). Three types of individuals can be occur in our model: predators X, prey Y and plants Z. Predators eat prey who feed on plants. One species can only appear once in the same site at a given time, such that the state of one site will be one of the following eight states: $s_{1} -$ empty, $s_{2} -$ one unit of plants, $s_{3} -$ a prey, $s_{4} -$ one unit of plants and a

Results

Full exploration of the parameter space is possible only if there are not too many parameters. As in nearly all theoretical models, we do not try to reproduce faithfully a given ecosystem. Our previous researches show more about the impacts of parameters on the ecosystem (Wang, He, Kang, 2012, Wang, Pan, Kang, He, 2016). Let the set of basic parameters be $(L, p_{h}^{X}, p_{b}^{X}, p_{b}^{Y}, f_{X}, f_{Y}) = (200, 0.8, 0.1, 0.8, 10, 5)$ . All simulations were run with initial random spatial distributions. The results reported below

Conclusion

We have considered a reinforcement-learning based predator-prey model using Monte Carlo simulation. Unlike previous adaptive models (Beckerman, Petchey, Morin, 2010, Heckmann, Drossel, Brose, Guill, 2012, Kondoh, 2003), our model takes into account the learning-based behavioral evolution of predators and prey under a two-dimensional environment. By using the reinforcement-learning algorithm, predators and prey learn to choose actions and update policies based on their current and expected

CRediT authorship contribution statement

Xueting Wang: Investigation, Writing - review & editing. Jun Cheng: Investigation, Writing - review & editing. Lei Wang: Writing - review & editing.

Declaration of Competing Interest

The authors declared that they have no conflicts of interest to this work. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Acknowledgments

We gratefully acknowledge the kind anonymous referees for their helpful comments. This work was supported by National Natural Science Foundation of China (U1713213, 61772508, U1813205), Key Research and Development Program of Guangdong Province [grant numbers 2019B090915001], Shenzhen Technology Project (JCYJ20170413152535587, JSGG20170823091924128, JCYJ20170307164023599, JCYJ20180507182610734), and CAS Key Technology Talent Program.

References (53)

T.M. Anderson et al.
Network-agent based model for simulating the dynamic spatial network structure of complex ecological systems
Ecol. Modell.
(2018)
M. Droz et al.
Role of asymmetry in competition for light in a model of annual plants
Physica A Stat. Mech. Appl.
(2012)
S. Gobeyn et al.
Evolutionary algorithms for species distribution modelling: a review in the context of machine learning
Ecol. Modell.
(2019)
Y. Kang et al.
A golden point rule in rock paper scissors lizard spock game
Physica A Stat. Mech.Appl.
(2013)
M. Mohammed et al.
Frugivory and seed dispersal: extended bi-stable persistence and reduced clustering of plants
Ecol. Modell.
(2018)
M.M. Molina et al.
On the spatial dynamics and oscillatory behavior of a predator-prey model based on cellular automata and local particle swarm optimization
J. Theor. Biol.
(2013)
Pekalski et al.
Dynamics of three types of annual plants competing for water and light
Physica A Stat. Mech. Appl.
(2013)
M. Perc et al.
Coevolutionary games–a mini review.
Bio Syst.
(2010)
T. Reichenbach et al.
Self-organization of mobile populations in cyclic competition
J. Theor. Biol.
(2008)
J. Szwabinski et al.
Food web model with detritus path
Physica A Stat. Mech.Appl.
(2010)

X. Wang et al.

A computational predator prey model, pursuitevasion behavior based on different range of vision

Physica A Stat. Mech.Appl.

(2012)

X. Wang et al.

Predator group size distributions in predator prey systems

Ecol. Complex.

(2016)

F. Zhang et al.

Recent experience-driven behaviour optimizes foraging

Anim. Behav.

(2014)

M. Banerjee et al.

Self-organised spatial patterns and chaos in a ratio-dependent predatorprey system

Theor. Ecol.

(2011)

A. Beckerman et al.

Adaptive foragers and community ecology: linking individuals to communities and ecosystems

Funct. Ecol.

(2010)

M.B. Bonsall et al.

The effects of metapopulation structure on indirect interactions in host-parasitoid assemblages.

Proc. Biol. Sci.

(2000)

Å. Brännström et al.

Emergence and maintenance of biodiversity in an evolutionary food-web model

Theor. Ecol.

(2011)

M.V. Carneiro et al.

Spontaneous emergence of spatial patterns in a predator-prey model.

Phys. Rev. E

(2007)

Chen, Y., Wang, M., 2016. Stochastic primal-dual methods and sample complexity of reinforcement learning. ArXiv...

F. Dercole et al.

Chaotic red queen coevolution in three-species food chains

Proc. R. Soc. B

(2010)

F. Dercole et al.

Analysis of evolutionary processes: the adaptive dynamics approach and its applications

(2008)

M. Droz et al.

Coexistence in a predator-prey system.

Phys. Rev. E

(2001)

W.E. Frankenhuis et al.

Enriching behavioral ecology with reinforcement learning methods

Behav. Process.

(2018)

S. Geritz et al.

Evolutionarily singular strategies and the adaptive growth and branching of the evolutionary tree

Evol. Ecol.

(1998)

Q. He et al.

On the relationship between cyclic and hierarchical three-species predator-prey systems and the two-species Lotka-Volterra model

Eur. Phys. J. B

(2012)

L. Heckmann et al.

Interactive effects of body-size structure and adaptive foraging on food-web stability

Ecol. Lett.

(2012)

Cited by (10)

Modeling collective motion for fish schooling via multi-agent reinforcement learning
2023, Ecological Modelling
Citation Excerpt :
The increased likelihood of survival in the presence of predators is recognized as one of the benefits of collective motion for animals. Hence, some researchers proposed predator-prey models (Morihiro et al., 2008; Hahn et al., 2019; Sunehag et al., 2019; Wang et al., 2020) in which a predator and prey were placed in an environment where the predator attempted to catch the prey. At the end of each timestep, prey are rewarded to encourage policies that allow them to survive as long as possible.
Complex collective motion patterns can emerge from very simple local interactions among individual agents. However, it is still unclear how and why the interactions among individuals lead to the emergence of collective motion. Modeling is an effective way to understand the mechanisms that govern collective animal motions. In this work, to avoid imposing fixed sets of rules on collective motion models a priori as classical approaches do, we propose a new method of modeling collective motion for fish schooling via multi-agent reinforcement learning. We model each fish individual as an artificial learning agent, whose policy is acquired by using mean field Q-learning (MFQ). The observation of each fish agent is represented as a multi-channel image, where each channel describes a different feature, such as an agent's position or an agent's orientation. The policy of an agent is approximated with a neural network trained with the MFQ algorithm, during which, agents are rewarded (or penalized) according to the number of neighbors and consecutive collisions between individuals. We study the dynamics of collective motion that emerge from the learned policy. The experimental results show that the learned policy can produce collective motion in groups of various sizes. In addition, three different collective motion patterns observed in nature emerged during the training process. The learned policy can help us gain new insight into how and why individual interactions lead to collective motion. This study also demonstrates that multi-agent reinforcement learning has great potential to be a new approach for analysis and modeling of collective motion.
Stability Switching in Lotka-Volterra and Ricker-Type Predator-Prey Systems with Arbitrary Step Size
2023, Axioms
Stability switching in Lotka-Volterra and Ricker-type predator-prey systems with arbitrary step size
2023, arXiv
Modeling Collective Behavior for Fish School With Deep Q-Networks
2023, IEEE Access
Deep reinforcement learning for conservation decisions
2022, Methods in Ecology and Evolution
Evolutionary dynamics of predator in a community of interacting species
2022, Nonlinear Dynamics

View all citing articles on Scopus

View full text

A reinforcement learning-based predator-prey model

Highlights

Abstract

Introduction

Section snippets

Monte Carlo simulation

Results

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Ecol. Modell.

Physica A Stat. Mech. Appl.

Ecol. Modell.

Physica A Stat. Mech.Appl.

Ecol. Modell.

J. Theor. Biol.

Physica A Stat. Mech. Appl.

Bio Syst.

J. Theor. Biol.

Physica A Stat. Mech.Appl.

Physica A Stat. Mech.Appl.

Ecol. Complex.

Anim. Behav.

Self-organised spatial patterns and chaos in a ratio-dependent predatorprey system

Theor. Ecol.

Adaptive foragers and community ecology: linking individuals to communities and ecosystems

Funct. Ecol.

The effects of metapopulation structure on indirect interactions in host-parasitoid assemblages.

Proc. Biol. Sci.

Emergence and maintenance of biodiversity in an evolutionary food-web model

Theor. Ecol.

Spontaneous emergence of spatial patterns in a predator-prey model.

Phys. Rev. E

Chaotic red queen coevolution in three-species food chains

Proc. R. Soc. B

Analysis of evolutionary processes: the adaptive dynamics approach and its applications

Coexistence in a predator-prey system.

Phys. Rev. E

Enriching behavioral ecology with reinforcement learning methods

Behav. Process.

Evolutionarily singular strategies and the adaptive growth and branching of the evolutionary tree

Evol. Ecol.

On the relationship between cyclic and hierarchical three-species predator-prey systems and the two-species Lotka-Volterra model

Eur. Phys. J. B

Interactive effects of body-size structure and adaptive foraging on food-web stability

Ecol. Lett.