1 Introduction

Cooperation is one of the most important issues in population dynamics. Although creatures act selfishly according to Darwin’s theory, they cooperate with each other despite the costs involved. This puzzle attracts a great deal of issues among researchers [1]. A powerful tool for investigation of cooperation emergence is game theory. Game is an abstraction of relationships between individuals or species. According to the game rules, rational players compete against each other and each player’s goal is to maximize its payoff. The payoff indicates the amount of benefit that a player gains[2]. With the advent of evolutionary game theory, the performance of individuals can easily be separated from logic and players have a collective interest instead of a selfish choice.

Prisoners dilemma game (PDG) is a game with axial role of cooperation. In this game, players have two strategies to choose which are defined as cooperation and defection [2, 3]. According to PDG payoff matrix, defection is a better choice for each player regardless of what opponents’ strategy is [4]. The puzzle is that albeit cooperation brings mutual benefit for both players, any rational player would choose to defect [3, 5, 6].

In recent years, researchers have accomplished much research on PDG for solving this puzzle and they have studied the role of effective ingredient on cooperation such as the role of different networks (spatial structure) [2, 5, 7, 8], edge rewiring [9, 10], noise [5, 11, 12] and different update rules [13]. In addition, many different ways have been used for increasing cooperation such as time delay [14, 15], partner switching [16], migration [17], collective decision-making [18], etc. One of the noticeable studies is investigating the effect of memory on strategy change that is explored by Axelrod for the first time. Axelrod introduced tit-for-tat game (cooperating in the first round and then doing whatever the opponent did at last round) [19]. He studied the essential role of player’s memory on the repeated game. Thereafter, a multitude of studies has been done on the effect of memory on cooperation dynamic. While most proposed models were on the basis of imitation update rule [20], from this point of view, there are some update rules which have received less attention and some people believe that they can present a more authentic description of the society’s behavior.

In the real world, instead of imitating others, many people prefer to identify the best strategy for themselves individually based on their own specific condition which is their interacting players’ strategic composition. In the simplest form of this update rule, players decide based on the current strategy of their neighbors, assuming that their strategies are stable and will not change at least in near future. This update rule is called logit which is the noisy version of the myopic best response update rule [21, 22]. This update rule allows the player to select all her possible strategies with a probability dependent on the payoff variation. The concept of potential games becomes fruitful for this special version which drive the system into the so-called Gibbs state characterized by the Boltzmann distribution . For these special cases, we can utilize the enormous amount of results obtained in statistical and solid-state physics (such as Ising model) to explain phenomena emerging in social and biological systems, too [23]. This update rule is also coupled with a simple-minded predictive rule and a naive view of consequences [24]. They are factors that cause this update rule to have typically great potential to destroy cooperation in society. The condition get worse when it comes to some games such as PDG (or its special case named donation game which is used in current study) that defection is a dominant strategy (the strategy that is the best choice regardless of what the opponent’s strategy are). Compared to the ordinary logit update rule, the ordinary imitation is more resistant against defection. It goes back to this fact that, in imitation update rule, the society’s behavior, apart from the nature of the game, indirectly plays a role in people’s decision-making process which may not be so obvious. The effect of society’s behavior on decision-making is a factor whose vacancy is felt more seriously in logit update rule. For filling this vacancy, it is needed to make decisions based on deeper and wider information.

In the current study to fulfill this goal, we improved ordinary logit update rule. The source of information on which players estimate the profitability of each strategy is expanded by adding two extra sources of information. For this purpose, we consider the information comes from monitoring activity of a larger group of neighbors and the information comes from monitoring their activity for a longer period of time. The largeness of group that first information, which is called environmental information, is collected from is measured by a parameter named vision range. The second one is called memory and using it along with environmental information can give more depth and credibility to players’ estimation of strategies profitability.

In this work, we used a special case of PDG which is known as donation game. It is important to notice that although coordination component is missing in the original donation game, it appears in the effective interaction if the strategy update is directly affected by the neighboring strategies.

This paper is organized as follows. In Sect. 2, the considered model and description of simulation are described in detail. Next, in Sect. 3, the results are presented and their implications are discussed. Finally, a concluding remark is given.

2 Model

We considered donation game with two players and two strategies named C and D. Income of players depends on their strategies as well as their neighbors. This income is defined in the following payoff matrix:

(1)

where \(r=c/b\) represents the cost-to-benefit ratio. In the present model, each player occupies one site of a square lattice with periodic condition and collects income by playing with four neighbors (Von Neumann neighborhood). Besides, update rate parameter is considered to represent the percentage of players that update their strategies synchronously in each time step. In each step, these players are chosen randomly. Updating strategies occurs according to the logit update rule. In this rule, the players’ strategy may change based on the difference between their current income and the income they may get by switching their strategy. Each player switches his current strategy to the opposite one with the following probability:

$$\begin{aligned} w = \frac{1}{1 + \exp (\frac{{u_i - u'_i}}{K})}, \end{aligned}$$
(2)

where K quantifies the probability of mistakes or irrational choices occur. \(k=0.01\) in all simulations. \(u_i\) and \(u'_i\) estimate player i’s profit from current and other strategy respectively. In original logit update rule, these estimations are only based on each player individual current condition such that \(u_i\) is player i total payoff and \(u'_i\) is total payoff that player i would have by switching strategy assuming neighbors’ strategy to be fixed. In this work, these estimations are not exclusively based on each player individual current condition but also it depends on his condition and the condition of some other players which he has a vision on them. From this point of view, \(U_i\) and \(U'_i\) are determined by following equations:

$$\begin{aligned}&U_{i}(t) =\frac{u_i + \sum _{j\subseteq G_v} u_j . \delta _{s_i.s_j}}{1 + \sum _{j\subseteq G_v} \delta _{s_i.s_j}}, \end{aligned}$$
(3)
$$\begin{aligned}&U'_{i}(t) =\frac{u'_i + \sum _{j\subseteq G_v} u_j . \delta _{s'_i.s_j}}{1 + \sum _{j\subseteq G_v} \delta _{s'_i.s_j}}, \end{aligned}$$
(4)

where \(G_v\) is the group of players that player i is monitoring. v is vision range of players in a way that \(G_0\) contains no neighbors, \(G_1\) contains first neighbors, \(G_2\) contains first and second neighbors and so on (see Fig. 1). \(s_i\) and \(s_j\) are strategies of players i and j, respectively. \(s'_i\) is the player i’s opposite strategy. In addition, estimation can also depend on information about previous time steps, which means having memory. The memory of players from cooperation and defection in time step t is defined as follows:

Fig. 1
figure 1

Players which player i is monitoring for a vision range = 0, b vision range = 1, c vision range = 2, and d vision range = 4

$$\begin{aligned}&M_i^C (t) =\frac{u_i. \delta _{s_i.C} + \sum _{j\subseteq G_v} u_j . \delta _{s_j.C}}{\delta _{s_i.C} + \sum _{j\subseteq G_v}\delta _{s_j.C}}, \end{aligned}$$
(5)
$$\begin{aligned}&M_i^D (t) =\frac{u_i. \delta _{s_i.D} + \sum _{j\subseteq G_v} u_j . \delta _{s_j.D}}{\delta _{s_i.D} + \sum _{j\subseteq G_v} \delta _{s_j.D}}, \end{aligned}$$
(6)

where \(M_i^C (t)\) is player \(i's\) memory of cooperation at time t, consisting of average payoff of cooperators which exist in his vision range at that time step. By considering information which comes from activity of players in vision range in current and previous time steps, player i final estimation of current and opposite strategy profitability are as follows:

$$\begin{aligned}&E_i(t) =\frac{U_{i}(t)+\sum _{\tau =0}^{t-1} e^{\alpha (\tau -t)}. M_i^{s_i} (\tau ) . A_i^{s_i}(\tau )}{1+\sum _{\tau =0}^{t-1} e^{\alpha (\tau -t)} . A_i^{s_i} (\tau )}, \end{aligned}$$
(7)
$$\begin{aligned}&E'_i(t) =\frac{U'_{i}(t)+\sum _{\tau =0}^{t-1} e^{\alpha (\tau -t)}. M_i^{s'_i} (\tau ) . A_i^{s'_i} (\tau )}{1+\sum _{\tau =0}^{t-1} e^{\alpha (\tau -t)} . A_i^{s'_i} (\tau )}, \end{aligned}$$
(8)

where \(A_i^{s_i} (\tau )\) is 0 if there is no player with strategy \(s_i\) in vision range of player i at time step \(\tau \) and \(A_i^{s'_i} (\tau )\) is 0 if there is no player with strategy \(s'_i\) in vision range of that player at that time step. Otherwise, they are 1. Parameter \(\alpha \) is memory damping coefficient. The limit \(\alpha \rightarrow 0 \) indicates complete memory and \(\alpha \rightarrow \infty \) indicates a memoryless condition. In most previous studies on the role of memory, the concept of memory length has been taken into account in the way that information from all previous steps in the memory length range have the same weight. However, in the current study, the use of the concept of damping cause past steps information get weights in a way that farther steps information would be less relevant. Likewise, as the damping coefficient decreases, the impact of past information increases compared to the present one. Finally, strategy switching probability which the current simulation is based on results from replacing \(u_i\) and \(u_j\) in Eq. 2 with \(E_i\) and \(E_j\), respectively, which are more reliable estimations of strategies profitability:

$$\begin{aligned} w = \frac{1}{1 + \exp (\frac{{E_i - E'_i}}{K})}. \end{aligned}$$
(9)

3 Results and discussion

The simulation in the present study is carried out for a uniform two-dimensional square lattice with periodic boundary condition where each node is connected to four neighbors. Initially, two strategies of C and D are distributed in equal numbers randomly across the network and all players have no memory of two strategies. The main part of the results is obtained for a lattice with size \(12\) \(\times \) \(12\) and they are compared with the results of larger lattices in some cases. In this study, the population does not necessarily reach a constant value, and it has a periodic oscillatory state in most cases. Thus, the cooperation level is obtained by averaging over all time steps.

3.1 Environmental information

Figure 2a shows the cooperation level of \(f_c\) in \(12\) \(\times \) \(12\) lattice as a function of vision range for different values of r, update rate = 0.5 and without considering memory effect by taking a large value for memory damping coefficient (\(\alpha \) = 10). Update rate is the percentage of players who update their strategy in each time step. Interestingly, cooperation level is not uniformly changed by vision range and there is an optimized vision range for each value of r that makes the cooperation level maximized. As it can be seen, the optimized vision range is larger for lower values of r. Similar simulation for larger lattice with size \(24\) \(\times \) \(24\) indicates optimized values of vision range depend on lattice size (see Fig. 2b, c). Moreover, it can be seen that in larger lattices, maximum values of cooperation level are increased for different values of r.

Fig. 2
figure 2

Cooperation level as a function of vision range for different values of r for lattice size value of a 12, b 24. These results are obtained for update-rate = 0.5, \(\alpha \) = 10

Fig. 3
figure 3

Time variation of cooperators population during 100 time steps for vision range of values. a Vision range = 1, b vision range = 5, c vision range = 10, d probability distribution of cooperator population alteration for three important values of vision range. These results are obtained for L = 12, update rate = 0.5, r = 0.08 and \(\alpha \) = 10

Fig. 4
figure 4

a Cooperation level as a function of \(\alpha \) for vision range value of 0 and different values of r. b Cooperation level as a function of \(\alpha \) and vision range for r = 0.08

By putting these facts together, it can be deduced that there are two factors which are competing, vision range value, and a difference in what different players observe. Alternatively stated, while monitoring more players’ activity can be in the favor of cooperation, in very large vision ranges overlapping between players’ observation group causes a decrease in the difference between what different players observe which seems to have an adverse effect on cooperation level. In larger lattices, players can observe larger groups of players without overlapping with many other players’ observation group. When vision ranges are too large, players observe the same conditions as many other players observe. Under this circumstance, most players will probably make the same decision at the same time, which in some ways is similar to synchronization between oscillators when they have the same oscillation source. As a result, larger fluctuations that increase the likelihood of the system being in full (or nearly full) cooperator or defector states can be seen. In prisoner’s dilemma game, the full (or nearly full) defector state is much more stable than the full (or nearly full) cooperator state. Therefore, excessively large values of vision range can cause the system to get stuck in full defector states for long periods of time, whereas full cooperator states are not long-lived. Figure 3 confirms this description. Figure 3d shows the probability of different cooperators population alteration values occurring at each step for three important vision range values. As the vision range increases, the probability distribution shifts toward greater cooperators population alteration values. This difference in values of cooperator population fluctuations can be clearly seen in Fig. 3a–c.

These graphs show the cooperator population changes over time. One can clearly observe that for the larger vision range, the amplitude of the oscillation size increases such that for vision range value of 10, system alternates over a wide range of cooperator population values. In addition, as noted above, due to the higher stability of states with low cooperator population, this system spends a relatively long time in low cooperator population states which leads to lower time average cooperation level for a too large value of vision ranges compared to intermediate values (see Fig. 3c).

Fig. 5
figure 5

a Time average of standard deviation of different players’ estimation of strategies profitability. b Time variation of cooperators population during 10,000 time step for vision range value of 6 and \(\alpha \) value of 0.02

3.2 Memory

In what follows, the results obtained under the condition that players use past information for decision-making are presented. By decreasing of damping coefficient \(\alpha \), players remember longer past steps information more clearly and use them in their estimation of strategies profitability and consequently in their decision-making. For the memory effect alone (by considering vision range value of 0), which is the conditions similar to those that previous memory investigation was performed, the results, as expected, show a steady increase in cooperation level with decreasing in \(\alpha \) value (see Fig. 4a). For each value of r, the major memory-caused change in cooperation level occurs in a particular \(\alpha \) span. This \(\alpha \) span becomes smaller with increasing r values in a way that for \(r=0\) cooperation level gradually increases in the whole range of \(\alpha \); while, for \(r=20\), the main change in cooperation level occurs between 0.05 and 0.2 values of \(\alpha \).

3.3 Environmental information and memory

By considering data collection from neighbors along with data collection from previous steps, it can be seen that system can reach higher values of cooperation level by choosing appropriate values of \(\alpha \) and vision range (see Fig. 4b). More importantly, their effect on the cooperation level is interdependent. For vision range values less than 2, it appears that the cooperation level is simply the superposition of vision range and memory effect; while, for higher values of vision ranges up to 7, the memory dependence on the cooperation level is clearly changed. In this vision range span, the effect of the damping coefficient on the cooperation level is no longer monotonous. Unlike before, lower amounts of memory damping coefficient do not necessarily benefit the level of cooperation. Instead, the best level of cooperation is obtained for the intermediate values of \(\alpha \) and the optimum \(\alpha \) range gets narrower with increasing vision range. Finally, for too large values of the vision range, the cooperation level is significantly reduced, no matter what the damping coefficient is.

The reason for this behavior can be traced in the same phenomenon that has led to a drop in cooperation level in large values of vision range which has been previously discussed. That phenomenon is the similarity between different players’ estimation of strategies profitability. But why longer memory can help increase such similarity? To address this issue, it should be taken into account that in simulations of the present study all conditions including network properties, decision-making behavior, and noise level are similar among all players and all over the society.

Under these circumstances, it would be expected that various regions of a society experience a somewhat similar series of events and population composition sets over time. Although for very small regions, probability of such similarity formation between different regions seems to be very unlikely for finite time steps, when it comes to larger regions which are equivalent with larger vision ranges, this possibility is no longer negligible.

Indeed, considering the fact that in the presence of memory, the players’ estimation of strategies profitability are rooted in what they observe during time period they remember, it is expected that different players’ estimation of strategies profitability get close by increasing memory length in large enough vision ranges. The closer the players’ estimation of strategies profitability, the higher similarity between what strategy they choose. This leads to synchronization between what players choose, which can be destructive for cooperation level in society despite the positive effect that memory could have on players’ estimation of cooperation profitability. As the case with the vision range alone, the negative effect of memory on cooperation level outweighs its positive effect when players remember a too long period of time which is equivalent to lowering the damping coefficient than a critical value. Due to the intensifying effect of vision range on the similarity between different players’ memory, as already mentioned, the critical value of damping coefficient increases as vision range goes up (see Fig. 4b).

The similarity between different players’ estimation of strategies profitability can be measured by calculating the standard deviation of it. The time-averaged value of this parameter for all values of the damping coefficient and some important values of the vision range is calculated and the results can be seen in Fig. 5a. As it is expected, the standard deviation values decrease by increasing vision range and decreasing damping coefficient. As discussed, one consequence of such noticeable similarity is large fluctuations in the cooperators’ population. the diagram shown in Fig. 5b is an example of such fluctuations. Although it is mentioned that higher stability of the society in full or nearly full defection state compared to full or nearly full cooperation state is rooted in the nature of the donation game, the situation can be explored in more details, especially in the presence of memory.

Contrary to what may seem, a full cooperator society is not ideal situation because it is prone to sudden collapse to full defector state. The negative memory of defection which previously had caused players to shift their strategy to cooperation and bring about such a thriving cooperating state would not be lasting. A full cooperator society will collapse as soon as players forget the negative memory of defection. While the absence of defectors in society can cause forgetting the negative consequence of defection, presence of a few numbers of them may lead to defectors colony formations which constantly remind players how defection can be harmful and prevent cooperation in society from collapsing.

The disfavor conditions of fully defector society which is formed after such a thriving cooperation period are not immediately and easily reversible. Despite the positive memory of cooperation that players have from previous fully cooperating state, the deceptive favorable memory of defection which was formed during the transition period prevents such a return to cooperation. Over time, albeit the negative memory of defection in fully defector society replaces the positive memory of defection which has formed in the transition period, simultaneously the positive memory of cooperation which has formed in its thriving period is also forgotten. In such a situation, although the players are not satisfied with the condition, unlike in the full cooperator case, society does not move quickly and easily towards cooperation. It is because of the fact that in the absence of favorable memory of cooperation, none of the players have any incentives for changing strategy from defection to cooperation. In this case, the formation of a small colony of cooperators as a result of irrational decision-making can trigger a very fast and sudden movement of society toward cooperation. Unfortunately, such an event will take a long time to happen and it causes the society to stuck in unfavorable full defector state for longer periods of time in comparison to time periods that are spent in full cooperator state.

4 Conclusion

In summary, we have studied evolution of cooperation under donation game in a society that players update their strategy based on a modified version of logit update rule in a way that they use wider source of information in their decision-making process. The information comes from monitoring himself and other players’ activity in current and previous time steps. In our simulation, value of information gathering area and the players’ memory power are scaled, respectively, by two parameters of vision range and damping coefficient. The simulation of the present study shows that, as expected, using both extra source of information increases cooperation level in society through providing more realistic estimation of strategies profitability for players. However, gathering information from extra large area and extra long period of time bring about an appreciable similarity between different player’s estimation of strategies profitability. Such similarity causes synchronization between players’ decision-making behavior which leads to large fluctuations in cooperators level and consequently society get stuck in full or nearly full defector state.

As a result, in most cases, increasing vision range and decreasing damping coefficient do not increase cooperation level monotonously. Instead, there are optimized values of these two parameters which maximize the cooperation level and those values are interdependent in a way that optimized value of damping coefficient increases by increasing vision range value. From the view of non-uniform trend of the effect of increasing memory power on cooperation level, our results is different from many of previous investigations which show increasing memory power is always in the favor of cooperation.