Modeling the Formation of Social Conventions from Embodied Real-Time Interactions,arXiv - CS - Multiagent Systems

当前位置： X-MOL 学术 › arXiv.cs.MA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Modeling the Formation of Social Conventions from Embodied Real-Time Interactions
arXiv - CS - Multiagent Systems Pub Date : 2018-02-16 , DOI: arxiv-1802.06108
Ismael T. Freire, Clement Moulin-Frier, Marti Sanchez-Fibla, Xerxes D. Arsiwalla, Paul Verschure

What is the role of real-time control and learning in the formation of social conventions? To answer this question, we propose a computational model that matches human behavioral data in a social decision-making game that was analyzed both in discrete-time and continuous-time setups. Furthermore, unlike previous approaches, our model takes into account the role of sensorimotor control loops in embodied decision-making scenarios. For this purpose, we introduce the Control-based Reinforcement Learning (CRL) model. CRL is grounded in the Distributed Adaptive Control (DAC) theory of mind and brain, where low-level sensorimotor control is modulated through perceptual and behavioral learning in a layered structure. CRL follows these principles by implementing a feedback control loop handling the agent's reactive behaviors (pre-wired reflexes), along with an adaptive layer that uses reinforcement learning to maximize long-term reward. We test our model in a multi-agent game-theoretic task in which coordination must be achieved to find an optimal solution. We show that CRL is able to reach human-level performance on standard game-theoretic metrics such as efficiency in acquiring rewards and fairness in reward distribution.

中文翻译：

从具体的实时交互中模拟社会习俗的形成

实时控制和学习在社会习俗形成中的作用是什么？为了回答这个问题，我们提出了一个计算模型，该模型与社会决策游戏中的人类行为数据相匹配，该游戏在离散时间和连续时间设置中进行了分析。此外，与以前的方法不同，我们的模型考虑了感觉运动控制回路在具体决策场景中的作用。为此，我们引入了基于控制的强化学习（CRL）模型。CRL 以思维和大脑的分布式自适应控制 (DAC) 理论为基础，其中通过分层结构中的感知和行为学习来调节低水平的感觉运动控制。CRL 遵循这些原则，通过实施一个反馈控制回路来处理代理的反应行为（预先连接的反射），以及使用强化学习来最大化长期奖励的自适应层。我们在多智能体博弈论任务中测试我们的模型，其中必须实现协调以找到最佳解决方案。我们表明，CRL 能够在标准博弈论指标上达到人类水平的表现，例如获取奖励的效率和奖励分配的公平性。

更新日期：2020-01-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文