当前位置: X-MOL 学术Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ballooning Multi-armed Bandits
Artificial Intelligence ( IF 14.4 ) Pub Date : 2021-02-24 , DOI: 10.1016/j.artint.2021.103485
Ganesh Ghalme , Swapnil Dhamal , Shweta Jain , Sujit Gujar , Y. Narahari

In this paper, we introduce ballooning multi-armed bandits (BL-MAB), a novel extension of the classical stochastic MAB model. In the BL-MAB model, the set of available arms grows (or balloons) over time. In contrast to the classical MAB setting where the regret is computed with respect to the best arm overall, the regret in a BL-MAB setting is computed with respect to the best available arm at each time. We first observe that the existing stochastic MAB algorithms result in linear regret for the BL-MAB model. We prove that, if the best arm is equally likely to arrive at any time instant, a sub-linear regret cannot be achieved. Next, we show that if the best arm is more likely to arrive in the early rounds, one can achieve sub-linear regret. Our proposed algorithm determines (1) the fraction of the time horizon for which the newly arriving arms should be explored and (2) the sequence of arm pulls in the exploitation phase from among the explored arms. Making reasonable assumptions on the arrival distribution of the best arm in terms of the thinness of the distribution's tail, we prove that the proposed algorithm achieves sub-linear instance-independent regret. We further quantify explicit dependence of regret on the arrival distribution parameters. We reinforce our theoretical findings with extensive simulation results. We conclude by showing that our algorithm would achieve sub-linear regret even if (a) the distributional parameters are not exactly known, but are obtained using a reasonable learning mechanism or (b) the best arm is not more likely to arrive early, but a large fraction of arms is likely to arrive relatively early.



中文翻译:

气球多臂匪

在本文中,我们介绍了膨胀多臂匪(BL-MAB),这是经典随机MAB模型的新颖扩展。在BL-MAB模型中,可用臂的集合随时间增长(或膨胀)。与传统的MAB设置相对于总体上最佳手臂计算后悔的方法不同,BL-MAB设置中的后悔是相对于每次最佳可用手臂计算的。我们首先观察到,现有的随机MAB算法导致BL-MAB模型的线性后悔。我们证明,如果最好的手臂在任何时候都可能到达,则无法实现次线性遗憾。接下来,我们表明,如果最好的一臂更有可能在早期回合中到达,则可以实现次线性遗憾。我们提出的算法确定了(1)应该探索新到达的武器的时间范围的比例,以及(2)在探索的武器之中,在开发阶段的武器拔出的顺序。根据最佳机臂的到达分布,根据分布的尾巴的细度做出合理的假设,我们证明了该算法实现了次线性实例无关的后悔。我们进一步量化遗憾对到达分布参数的显式依赖。我们通过广泛的仿真结果来强化理论发现。通过得出结论,我们的算法得出结论,即使(a)分布参数不是确切已知,而是使用合理的学习机制获得的,或者(b)最佳臂不可能更早到达,我们的算法仍可实现次线性遗憾

更新日期:2021-02-24
down
wechat
bug