Asymptotically optimal algorithms for budgeted multiple play bandits,Machine Learning

当前位置： X-MOL 学术 › Mach. Learn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Asymptotically optimal algorithms for budgeted multiple play bandits
Machine Learning ( IF 7.5 ) Pub Date : 2019-05-16 , DOI: 10.1007/s10994-019-05799-x
Alex Luedtke , Emilie Kaufmann , Antoine Chambaz

We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time that dictates how much she can expect to spend. We derive an asymptotic regret lower bound for any uniformly efficient algorithm in our setting. We then study a variant of Thompson sampling for Bernoulli rewards and a variant of KL-UCB for both single-parameter exponential families and bounded, finitely supported rewards. We show these algorithms are asymptotically optimal, both in rate and leading problem-dependent constants, including in the thick margin setting where multiple arms fall on the decision boundary.

中文翻译：

预算多玩老虎机的渐近最优算法

我们研究了多臂老虎机问题的泛化，其中包含与拉动每条手臂相关的成本，并且代理在每次都有预算，这决定了她可以预期花费多少。我们为我们的设置中的任何一致有效的算法推导出渐近后悔下界。然后，我们研究了伯努利奖励的汤普森采样变体和单参数指数族和有界有限支持奖励的 KL-UCB 变体。我们展示了这些算法在速率和主要问题相关常数方面都是渐近最优的，包括在多个臂落在决策边界上的厚边距设置中。

更新日期：2019-05-16

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>