当前位置: X-MOL 学术arXiv.cs.GT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Inducing Cooperative behaviour in Sequential-Social dilemmas through Multi-Agent Reinforcement Learning using Status-Quo Loss
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-01-15 , DOI: arxiv-2001.05458
Pinkesh Badjatiya, Mausoom Sarkar, Abhishek Sinha, Siddharth Singh, Nikaash Puri, Jayakumar Subramanian, Balaji Krishnamurthy

In social dilemma situations, individual rationality leads to sub-optimal group outcomes. Several human engagements can be modeled as a sequential (multi-step) social dilemmas. However, in contrast to humans, Deep Reinforcement Learning agents trained to optimize individual rewards in sequential social dilemmas converge to selfish, mutually harmful behavior. We introduce a status-quo loss (SQLoss) that encourages an agent to stick to the status quo, rather than repeatedly changing its policy. We show how agents trained with SQLoss evolve cooperative behavior in several social dilemma matrix games. To work with social dilemma games that have visual input, we propose GameDistill. GameDistill uses self-supervision and clustering to automatically extract cooperative and selfish policies from a social dilemma game. We combine GameDistill and SQLoss to show how agents evolve socially desirable cooperative behavior in the Coin Game.

中文翻译:

通过使用现状损失的多智能体强化学习在顺序社会困境中诱导合作行为

在社会困境的情况下,个人理性导致次优的群体结果。几个人类参与可以被建模为一个连续的(多步骤)社会困境。然而,与人类不同的是,经过训练的深度强化学习代理在连续的社会困境中优化个人奖励会收敛于自私、相互伤害的行为。我们引入了一种现状损失(SQLoss),它鼓励代理坚持现状,而不是反复改变其策略。我们展示了使用 SQLoss 训练的代理如何在几个社交困境矩阵游戏中演化合作行为。为了处理具有视觉输入的社交困境游戏,我们建议使用 GameDistill。GameDistill 使用自我监督和聚类从社交困境博弈中自动提取合作和自私的策略。
更新日期:2020-02-14
down
wechat
bug