当前位置: X-MOL 学术arXiv.cs.MA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cooperation and Reputation Dynamics with Reinforcement Learning
arXiv - CS - Multiagent Systems Pub Date : 2021-02-15 , DOI: arxiv-2102.07523
Nicolas Anastassacos, Julian García, Stephen Hailes, Mirco Musolesi

Creating incentives for cooperation is a challenge in natural and artificial systems. One potential answer is reputation, whereby agents trade the immediate cost of cooperation for the future benefits of having a good reputation. Game theoretical models have shown that specific social norms can make cooperation stable, but how agents can independently learn to establish effective reputation mechanisms on their own is less understood. We use a simple model of reinforcement learning to show that reputation mechanisms generate two coordination problems: agents need to learn how to coordinate on the meaning of existing reputations and collectively agree on a social norm to assign reputations to others based on their behavior. These coordination problems exhibit multiple equilibria, some of which effectively establish cooperation. When we train agents with a standard Q-learning algorithm in an environment with the presence of reputation mechanisms, convergence to undesirable equilibria is widespread. We propose two mechanisms to alleviate this: (i) seeding a proportion of the system with fixed agents that steer others towards good equilibria; and (ii), intrinsic rewards based on the idea of introspection, i.e., augmenting agents' rewards by an amount proportionate to the performance of their own strategy against themselves. A combination of these simple mechanisms is successful in stabilizing cooperation, even in a fully decentralized version of the problem where agents learn to use and assign reputations simultaneously. We show how our results relate to the literature in Evolutionary Game Theory, and discuss implications for artificial, human and hybrid systems, where reputations can be used as a way to establish trust and cooperation.

中文翻译:

强化学习的合作与声誉动态

在自然系统和人工系统中,创造合作激励是一项挑战。一个潜在的答案是声誉,代理商可以通过交易立即付出的合作成本来换取拥有良好声誉的未来利益。博弈论模型表明,特定的社会规范可以使合作稳定,但是对于代理商如何独立学习如何建立有效的声誉机制的了解却很少。我们使用一个简单的强化学习模型来说明声誉机制产生两个协调问题:代理需要学习如何协调现有声誉的含义,并共同商定一种社会规范,以便根据自己的行为将声誉分配给其他人。这些协调问题表现出多重均衡,其中一些有效建立了合作。当我们在存在信誉机制的环境中使用标准Q学习算法训练代理时,趋向于达到不期望的均衡。我们提出了两种缓解这种情况的机制:(i)用固定剂播种一定比例的系统,以引导他人达到良好的平衡;(ii)基于自省思想的内在奖励,即根据与自己对自己的策略的执行成比例地增加代理商的奖励。这些简单机制的组合成功地稳定了合作,即使在完全分散的问题中,代理也学会了同时使用和分配声誉。我们将展示我们的结果与进化博弈论中的文献之间的关系,并讨论对人工,人类和混合系统的影响,
更新日期:2021-02-16
down
wechat
bug