当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Linear Bandit Algorithms with Sublinear Time Complexity
arXiv - CS - Machine Learning Pub Date : 2021-03-03 , DOI: arxiv-2103.02729
Shuo Yang, Tongzheng Ren, Sanjay Shakkottai, Eric Price, Inderjit S. Dhillon, Sujay Sanghavi

We propose to accelerate existing linear bandit algorithms to achieve per-step time complexity sublinear in the number of arms $K$. The key to sublinear complexity is the realization that the arm selection in many linear bandit algorithms reduces to the maximum inner product search (MIPS) problem. Correspondingly, we propose an algorithm that approximately solves the MIPS problem for a sequence of adaptive queries yielding near-linear preprocessing time complexity and sublinear query time complexity. Using the proposed MIPS solver as a sub-routine, we present two bandit algorithms (one based on UCB, and the other based on TS) that achieve sublinear time complexity. We explicitly characterize the tradeoff between the per-step time complexity and regret, and show that our proposed algorithms can achieve $O(K^{1-\alpha(T)})$ per-step complexity for some $\alpha(T) > 0$ and $\widetilde O(\sqrt{T})$ regret, where $T$ is the time horizon. Further, we present the theoretical limit of the tradeoff, which provides a lower bound for the per-step time complexity. We also discuss other choices of approximate MIPS algorithms and other applications to linear bandit problems.

中文翻译:

次线性时间复杂度的线性Bandit算法

我们建议加速现有的线性强盗算法,以在臂数$ K $中实现每步时间复杂度为亚线性。亚线性复杂度的关键是要认识到,许多线性强盗算法中的手臂选择会减少到最大内积搜索(MIPS)问题。相应地,我们提出了一种算法,该算法可以近似解决一系列自适应查询的MIPS问题,从而产生接近线性的预处理时间复杂度和次线性查询时间复杂度。使用提出的MIPS求解器作为子例程,我们提出了两种实现亚线性时间复杂度的强盗算法(一种基于UCB,另一种基于TS)。我们明确描述了每步时间复杂度和遗憾之间的权衡,并表明我们提出的算法对于$ \ alpha(T)> 0 $和$ \ widetilde O(\ sqrt {T} )$感到遗憾,其中$ T $是时间范围。此外,我们提出了折衷的理论极限,它为每步时间复杂度提供了一个下限。我们还将讨论近似MIPS算法的其他选择以及对线性强盗问题的其他应用。
更新日期:2021-03-05
down
wechat
bug