Theory of Mind with Guilt Aversion Facilitates Cooperative Reinforcement Learning,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Theory of Mind with Guilt Aversion Facilitates Cooperative Reinforcement Learning
arXiv - CS - Artificial Intelligence Pub Date : 2020-09-16 , DOI: arxiv-2009.07445
Dung Nguyen, Svetha Venkatesh, Phuoc Nguyen, Truyen Tran

Guilt aversion induces experience of a utility loss in people if they believe they have disappointed others, and this promotes cooperative behaviour in human. In psychological game theory, guilt aversion necessitates modelling of agents that have theory about what other agents think, also known as Theory of Mind (ToM). We aim to build a new kind of affective reinforcement learning agents, called Theory of Mind Agents with Guilt Aversion (ToMAGA), which are equipped with an ability to think about the wellbeing of others instead of just self-interest. To validate the agent design, we use a general-sum game known as Stag Hunt as a test bed. As standard reinforcement learning agents could learn suboptimal policies in social dilemmas like Stag Hunt, we propose to use belief-based guilt aversion as a reward shaping mechanism. We show that our belief-based guilt averse agents can efficiently learn cooperative behaviours in Stag Hunt Games.

中文翻译：

内疚厌恶心理理论促进合作强化学习

如果人们认为自己让别人失望了，内疚厌恶会导致人们体验到效用损失，这促进了人类的合作行为。在心理博弈论中，内疚厌恶需要对具有关于其他智能体想法的理论的智能体建模，也称为心理理论 (ToM)。我们的目标是构建一种新的情感强化学习代理，称为具有内疚厌恶的心理代理理论（ToMAGA），它具有考虑他人福祉而不仅仅是自身利益的能力。为了验证代理设计，我们使用称为 Stag Hunt 的通用和游戏作为测试平台。由于标准的强化学习代理可以在 Stag Hunt 等社会困境中学习次优策略，因此我们建议使用基于信念的内疚厌恶作为奖励塑造机制。

更新日期：2020-09-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文