Stochastic Control Approach to Reputation Games,arXiv - CS - Computer Science and Game Theory

当前位置： X-MOL 学术 › arXiv.cs.GT › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Stochastic Control Approach to Reputation Games
arXiv - CS - Computer Science and Game Theory Pub Date : 2016-04-01 , DOI: arxiv-1604.00299
Nuh Ayg\"un Dalk{\i}ran and Serdar Y\"uksel

Through a stochastic control theoretic approach, we analyze reputation games where a strategic long-lived player acts in a sequential repeated game against a collection of short-lived players. The key assumption in our model is that the information of the short-lived players is nested in that of the long-lived player. This nested information structure is obtained through an appropriate monitoring structure. Under this monitoring structure, we show that, given mild assumptions, the set of Perfect Bayesian Equilibrium payoffs coincide with Markov Perfect Equilibrium payoffs, and hence a dynamic programming formulation can be obtained for the computation of equilibrium strategies of the strategic long-lived player in the discounted setup. We also consider the undiscounted average-payoff setup where we obtain an optimal equilibrium strategy of the strategic long-lived player under further technical conditions. We then use this optimal strategy in the undiscounted setup as a tool to obtain a tight upper payoff bound for the arbitrarily patient long-lived player in the discounted setup. Finally, by using measure concentration techniques, we obtain a refined lower payoff bound on the value of reputation in the discounted setup. We also study the continuity of equilibrium payoffs in the prior beliefs.

中文翻译：

声誉博弈的随机控制方法

通过随机控制理论方法，我们分析了声誉博弈，在这种博弈中，战略性长期玩家在与短期玩家集合的连续重复博弈中行动。我们模型中的关键假设是短期玩家的信息嵌套在长期玩家的信息中。这种嵌套的信息结构是通过适当的监控结构获得的。在这种监控结构下，我们表明，在假设温和的情况下，完美贝叶斯均衡收益集与马尔可夫完美均衡收益一致，因此可以获得动态规划公式来计算战略长期参与者的均衡策略。折扣设置。我们还考虑了未贴现的平均收益设置，我们在进一步的技术条件下获得了战略长期参与者的最佳均衡策略。然后，我们在未打折设置中使用此最佳策略作为工具，为打折设置中任意耐心的长期玩家获得严格的回报上限。最后，通过使用度量集中技术，我们在折扣设置中获得了声誉值的精细下支付界限。我们还研究了先验信念中均衡收益的连续性。我们在折扣设置中获得了声誉值的精细收益下限。我们还研究了先验信念中均衡收益的连续性。我们在折扣设置中获得了声誉值的精细收益下限。我们还研究了先验信念中均衡收益的连续性。

更新日期：2020-01-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文