当前位置: X-MOL 学术arXiv.cs.GT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Convergence Analysis of No-Regret Bidding Algorithms in Repeated Auctions
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-09-14 , DOI: arxiv-2009.06136
Zhe Feng, Guru Guruganesh, Christopher Liaw, Aranyak Mehta, Abhishek Sethi

The connection between games and no-regret algorithms has been widely studied in the literature. A fundamental result is that when all players play no-regret strategies, this produces a sequence of actions whose time-average is a coarse-correlated equilibrium of the game. However, much less is known about equilibrium selection in the case that multiple equilibria exist. In this work, we study the convergence of no-regret bidding algorithms in auctions. Besides being of theoretical interest, bidding dynamics in auctions is an important question from a practical viewpoint as well. We study repeated game between bidders in which a single item is sold at each time step and the bidder's value is drawn from an unknown distribution. We show that if the bidders use any mean-based learning rule then the bidders converge with high probability to the truthful pure Nash Equilibrium in a second price auction, in VCG auction in the multi-slot setting and to the Bayesian Nash equilibrium in a first price auction. We note mean-based algorithms cover a wide variety of known no-regret algorithms such as Exp3, UCB, $\epsilon$-Greedy etc. Also, we analyze the convergence of the individual iterates produced by such learning algorithms, as opposed to the time-average of the sequence. Our experiments corroborate our theoretical findings and also find a similar convergence when we use other strategies such as Deep Q-Learning.

中文翻译:

重复拍卖中无悔竞价算法的收敛性分析

博弈与无悔算法之间的联系已在文献中得到广泛研究。一个基本的结果是,当所有参与者都采取无悔策略时,这会产生一系列动作,其时间平均是博弈的粗相关均衡。然而,在存在多个均衡的情况下,对均衡选择的了解要少得多。在这项工作中,我们研究了拍卖中无悔投标算法的收敛性。除了具有理论意义之外,拍卖中的投标动态从实践的角度来看也是一个重要问题。我们研究了投标人之间的重复博弈,其中在每个时间步出售一件商品,投标人的价值来自未知分布。我们表明,如果投标者使用任何基于均值的学习规则,那么投标者在第二次价格拍卖中、在多槽设置中的 VCG 拍卖中以及在第一次中的贝叶斯纳什均衡中以高概率收敛到真实的纯纳什均衡价格拍卖。我们注意到基于均值的算法涵盖了各种已知的无后悔算法,例如 Exp3、UCB、$\epsilon$-Greedy 等。此外,我们分析了由此类学习算法产生的各个迭代的收敛性,而不是序列的时间平均。我们的实验证实了我们的理论发现,并且在我们使用其他策略(例如深度 Q 学习)时也发现了类似的收敛性。我们注意到基于均值的算法涵盖了各种已知的无后悔算法,例如 Exp3、UCB、$\epsilon$-Greedy 等。此外,我们分析了由此类学习算法产生的各个迭代的收敛性,而不是序列的时间平均。我们的实验证实了我们的理论发现,并且在我们使用其他策略(例如深度 Q 学习)时也发现了类似的收敛性。我们注意到基于均值的算法涵盖了各种已知的无后悔算法,例如 Exp3、UCB、$\epsilon$-Greedy 等。此外,我们分析了由此类学习算法产生的各个迭代的收敛性,而不是序列的时间平均。我们的实验证实了我们的理论发现,并且在我们使用其他策略(例如深度 Q 学习)时也发现了类似的收敛性。
更新日期:2020-09-15
down
wechat
bug