当前位置: X-MOL 学术Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Regret minimization in online Bayesian persuasion: Handling adversarial receiver's types under full and partial feedback models
Artificial Intelligence ( IF 14.4 ) Pub Date : 2022-11-12 , DOI: 10.1016/j.artint.2022.103821
Matteo Castiglioni , Andrea Celli , Alberto Marchesi , Nicola Gatti

In Bayesian persuasion, an informed sender has to design a signaling scheme that discloses the right amount of information so as to influence the behavior of a self-interested receiver. This kind of strategic interaction is ubiquitous in real-world economic scenarios. However, the seminal model by Kamenica and Gentzkow makes some stringent assumptions that limit its applicability in practice. One of the most limiting assumptions is, arguably, that the sender is required to know the receiver's utility function to compute an optimal signaling scheme. We relax this assumption through an online learning framework in which the sender repeatedly faces a receiver whose type is unknown and chosen adversarially at each round from a finite set of possible types. We are interested in no-regret algorithms prescribing a signaling scheme at each round of the repeated interaction with performances close to that of a best-in-hindsight signaling scheme. First, we prove a hardness result on the per-round running time required to achieve no-α-regret for any α<1. Then, we provide algorithms for the full and partial feedback models with regret bounds sublinear in the number of rounds and polynomial in the size of the instance. Finally, we show that, by relaxing the persuasiveness constraints on signaling schemes, it is possible to design an algorithm with a better running time and small regret.



中文翻译:

在线贝叶斯说服中的遗憾最小化:在完全和部分反馈模型下处理对抗接收者的类型

在贝叶斯说服中,消息灵通的发送者必须设计一种信号方案,以披露适量的信息,以影响利己的接收者的行为。这种战略互动在现实世界的经济场景中无处不在。然而,Kamenica 和 Gentzkow 的开创性模型做了一些严格的假设,限制了它在实践中的适用性。可以说,最具限制性的假设之一是,要求发送方知道接收方的效用函数以计算最佳信令方案。我们通过一个在线学习框架放宽了这个假设,在这个框架中,发送者反复面对一个类型未知的接收者,并且在每一轮中从有限的可能类型集中进行对抗性选择。我们感兴趣的是不后悔算法在每一轮重复交互中规定一个信号方案,其性能接近事后最佳信号方案。首先,我们证明了实现无-α-后悔所需的每轮运行时间的硬度结果α<1个. 然后,我们为完整部分反馈模型提供算法,其中遗憾界限在轮数中为次线性,在实例大小中为多项式。最后,我们表明,通过放宽对信令方案的说服力约束,可以设计出一种运行时间更好、遗憾更小的算法。

更新日期:2022-11-17
down
wechat
bug