A Predictive Strategy for the Iterated Prisoner's Dilemma,arXiv - CS - Computer Science and Game Theory

当前位置： X-MOL 学术 › arXiv.cs.GT › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Predictive Strategy for the Iterated Prisoner's Dilemma
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-09-03 , DOI: arxiv-2009.01668
Robert Prentner

The iterated prisoner's dilemma is a game that produces many counter-intuitive and complex behaviors in a social environment, based on very simple basic rules. It illustrates that cooperation can be a good thing even in a competitive world, that individual fitness needs not to be the most important criteria of success, and that some strategies are very strong in a direct confrontation but could still perform poorly on average or are evolutionarily unstable. In this contribution, we present a strategy -- PREDICTOR -- which appears to be "sentient" and chooses to cooperate when playing against some strategies, but defects when playing against others, without the need to record "tags" for its opponents or an involved decision-making mechanism. To be able to operate in the highly-contextual environment, as modeled by the iterated prisoner's dilemma, PREDICTOR learns from its experience to choose optimal actions by modeling its opponent and predicting a (fictive) future. It is shown that PREDICTOR is an efficient strategy for playing the iterated prisoner's dilemma and is simple to implement. In a simulated and representative tournament, it achieves high average scores and wins the tournament for various parameter settings. PREDICTOR thereby relies on a brief phase of exploration to improve its model, and it can evolve morality from intrinsically selfish behavior.

中文翻译：

重复囚徒困境的预测策略

迭代囚徒困境是一种基于非常简单的基本规则，在社会环境中产生许多反直觉和复杂行为的博弈。它说明了即使在竞争激烈的世界中，合作也可能是一件好事，个人适应度不必成为成功的最重要标准，并且某些策略在直接对抗中非常强大，但在平均水平上仍可能表现不佳或在进化上表现不佳不稳定。在这个贡献中，我们提出了一种策略——PREDICTOR——它看起来是“有感知的”，在对抗某些策略时选择合作，但在对抗其他策略时有缺陷，不需要为其对手或涉及决策机制。为了能够在高度上下文的环境中运作，正如迭代囚徒困境所模拟的那样，PREDICTOR 通过模拟对手并预测（虚构的）未来，从其经验中学习选择最佳行动。结果表明，PREDICTOR 是一种解决重复囚徒困境的有效策略，并且易于实现。在模拟和代表性的锦标赛中，它获得了高平均分，并在各种参数设置下赢得了锦标赛。因此，PREDICTOR 依靠一个简短的探索阶段来改进其模型，并且它可以从本质上的自私行为中演化出道德。在模拟和代表性的锦标赛中，它获得了高平均分，并在各种参数设置下赢得了锦标赛。因此，PREDICTOR 依靠一个简短的探索阶段来改进其模型，并且它可以从本质上的自私行为中演化出道德。在模拟和代表性的锦标赛中，它获得了高平均分，并在各种参数设置下赢得了锦标赛。因此，PREDICTOR 依靠一个简短的探索阶段来改进其模型，并且它可以从本质上的自私行为中演化出道德。

更新日期：2020-09-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文