当前位置: X-MOL 学术Ecol Modell › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A multi-armed bandit algorithm speeds up the evolution of cooperation
Ecological Modelling ( IF 2.6 ) Pub Date : 2021-01-01 , DOI: 10.1016/j.ecolmodel.2020.109348
Roberto Cazzolla Gatti

Abstract Most evolutionary biologists consider selfishness an intrinsic feature of our genes and as the best choice in social situations. During the last years, prolific research has been conducted on the mechanisms that can allow cooperation to emerge “in a world of defectors” to become an evolutionarily stable strategy. A big debate started with the proposal by W.D. Hamilton of “kin selection” in terms of cost sustained by the cooperators and benefits received by related conspecifics. After this, four other main rules for the evolution of cooperation have been suggested. However, one of the main problems of these five rules is the assumption that the payoffs obtained by either cooperating or defeating are quite well known by the parties before they interact and do not change during the time or after repeated encounters. This is not always the case in real life. Following each rule blindly, there is a risk for individuals to get stuck in an unfavorable situation. Axelrod (1984) highlighted that the main problem is how to obtain benefits from cooperation without passing through several trials and errors, which are slow and painful. With a better understanding of this process, individuals can use their foresight to speed up the evolution of cooperation. Here I show that a multi-armed bandit (MAB) model, a classic problem in decision sciences, is naturally employed by individuals to opt for the best choice most of the time, accelerating the evolution of the altruistic behavior and solving the abovementioned problems. A common MAB model that applies extremely well to the evolution of cooperation is the epsilon-greedy (e-greedy) algorithm. This algorithm, after an initial period of exploration (which can be considered as biological history), greedily exploits the best option e% of the time and explores other options the remaining percentage of times (1-e%). Through the epsilon-greedy decision-making algorithm, cooperation evolves as a multilevel process nested in the hierarchical levels that exist among the five rules for the evolution of cooperation. This reinforcement learning, a subtype of artificial intelligence, with trials and errors, provides a powerful tool to better understand and even probabilistically quantify the chances cooperation has to evolve in a specific situation.

中文翻译:

多臂老虎机算法加速合作进化

摘要 大多数进化生物学家认为自私是我们基因的内在特征,是社交场合的最佳选择。在过去的几年里,对可以让合作在“叛逃者的世界”中出现并成为一种进化稳定策略的机制进行了大量研究。一场大辩论始于 WD Hamilton 提出的“亲属选择”的提议,即合作者承担的成本和相关同种获得的利益。在此之后,还提出了合作发展的其他四个主要规则。然而,这五个规则的主要问题之一是假设合作或失败所获得的收益在双方互动之前是众所周知的,并且在时间或重复遭遇之后不会改变。在现实生活中情况并非总是如此。盲目地遵循每条规则,个人就有陷入不利境地的风险。Axelrod (1984) 强调,主要问题是如何在不经过多次缓慢而痛苦的尝试和错误的情况下从合作中获得收益。对这个过程有了更好的理解,个人可以利用他们的远见来加速合作的演变。在这里,我展示了多臂老虎机 (MAB) 模型,决策科学中的一个经典问题,在大多数情况下自然会被个人用来选择最佳选择,从而加速利他行为的进化并解决上述问题。一个非常适用于合作进化的常见 MAB 模型是 epsilon-greedy (e-greedy) 算法。这个算法,在最初的探索期(可以被认为是生物史)之后,贪婪地利用最佳选项 e% 的时间并探索其他选项的剩余时间百分比(1-e%)。通过 epsilon-greedy 决策算法,合作演变为一个多级过程,嵌套在存在于合作演化的五个规则之间的分层级别中。这种强化学习是人工智能的一个子类型,经过反复试验,为更好地理解甚至概率量化合作在特定情况下发展的机会提供了强大的工具。通过 epsilon-greedy 决策算法,合作演变为一个多级过程,嵌套在存在于合作演化的五个规则之间的分层级别中。这种强化学习是人工智能的一个子类型,经过反复试验,提供了一种强大的工具,可以更好地理解甚至概率量化合作在特定情况下发展的机会。通过 epsilon-greedy 决策算法,合作演变为一个多级过程,嵌套在存在于合作演化的五个规则之间的分层级别中。这种强化学习是人工智能的一个子类型,经过反复试验,提供了一种强大的工具,可以更好地理解甚至概率量化合作在特定情况下发展的机会。
更新日期:2021-01-01
down
wechat
bug