On the Equilibrium Elicitation of Markov Games Through Information Design,arXiv - CS - Multiagent Systems

当前位置： X-MOL 学术 › arXiv.cs.MA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On the Equilibrium Elicitation of Markov Games Through Information Design
arXiv - CS - Multiagent Systems Pub Date : 2021-02-14 , DOI: arxiv-2102.07152
Tao Zhang, Quanyan Zhu

This work considers a novel information design problem and studies how the craft of payoff-relevant environmental signals solely can influence the behaviors of intelligent agents. The agents' strategic interactions are captured by an incomplete-information Markov game, in which each agent first selects one environmental signal from multiple signal sources as additional payoff-relevant information and then takes an action. There is a rational information designer (designer) who possesses one signal source and aims to control the equilibrium behaviors of the agents by designing the information structure of her signals sent to the agents. An obedient principle is established which states that it is without loss of generality to focus on the direct information design when the information design incentivizes each agent to select the signal sent by the designer, such that the design process avoids the predictions of the agents' strategic selection behaviors. We then introduce the design protocol given a goal of the designer referred to as obedient implementability (OIL) and characterize the OIL in a class of obedient perfect Bayesian Markov Nash equilibria (O-PBME). A new framework for information design is proposed based on an approach of maximizing the optimal slack variables. Finally, we formulate the designer's goal selection problem and characterize it in terms of information design by establishing a relationship between the O-PBME and the Bayesian Markov correlated equilibria, in which we build upon the revelation principle in classic information design in economics. The proposed approach can be applied to elicit desired behaviors of multi-agent systems in competing as well as cooperating settings and be extended to heterogeneous stochastic games in the complete- and the incomplete-information environments.

中文翻译：

基于信息设计的马尔可夫博弈均衡启发

这项工作考虑了一个新颖的信息设计问题，并研究了与收益相关的环境信号的处理方式如何单独影响智能主体的行为。代理商的战略互动是通过不完整的信息马尔可夫博弈捕获的，其中每个代理商首先从多个信号源中选择一个环境信号作为其他与收益相关的信息，然后采取行动。有一个理性的信息设计者（designer），他拥有一个信号源，旨在通过设计发送给代理的信号的信息结构来控制代理的均衡行为。建立了一个服从的原则，该原则规定当信息设计激励每个代理选择设计者发送的信号时，专注于直接信息设计是不失一般性的，因此设计过程避免了对代理战略的预测选择行为。然后，我们介绍给定设计者目标的服从可实现性（OIL）的设计协议，并以一类服从的完美贝叶斯马尔可夫纳什均衡（O-PBME）来表征OIL。提出了一种基于最优松弛变量最大化的信息设计框架。最后，我们制定设计师目标选择问题，并通过建立O-PBME与贝叶斯马尔可夫相关均衡之间的关系在信息设计方面进行表征，在该模型中，我们基于经济学经典信息设计中的启示原理。所提出的方法可以应用于在竞争以及合作环境中引起多智能体系统的期望行为，并且可以扩展到完整和不完整信息环境中的异构随机游戏。

更新日期：2021-02-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>