当前位置: X-MOL 学术arXiv.cs.GT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Adaptive Discretization for Adversarial Bandits with Continuous Action Spaces
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-06-22 , DOI: arxiv-2006.12367
Chara Podimata, Aleksandrs Slivkins

Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces such as the [0,1] interval, where similar actions are guaranteed to have similar rewards. A central theme here is the adaptive discretization of the action space, which gradually "zooms in" on the more promising regions thereof. The goal is to take advantage of "nicer" problem instances, while retaining near-optimal worst-case performance. While the stochastic version of the problem is well-understood, the general version with adversarially chosen rewards is not. We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds. In particular, we recover the worst-case optimal regret bound for the adversarial version, and the instance-dependent regret bound for the stochastic version. Further, an application of our algorithm to dynamic pricing (a version in which the algorithm repeatedly adjusts prices for a product) enjoys these regret bounds without any smoothness assumptions.

中文翻译:

具有连续动作空间的对抗性强盗的自适应离散化

Lipschitz bandits 是多臂老虎机的一个突出版本,它研究大型结构化动作空间,例如 [0,1] 区间,其中类似的动作保证具有类似的奖励。这里的一个中心主题是动作空间的自适应离散化,它逐渐“放大”其中更有希望的区域。目标是利用“更好”的问题实例,同时保持接近最佳的最坏情况性能。虽然问题的随机版本很容易理解,但具有对抗性选择奖励的一般版本却不是。我们在对抗性版本中提供了第一个自适应离散化算法,并推导出依赖于实例的后悔边界。特别是,我们恢复了对抗版本的最坏情况最优后悔边界,以及随机版本的依赖于实例的遗憾。此外,我们的算法在动态定价(算法反复调整产品价格的版本)中的应用享有这些遗憾界限,而没有任何平滑假设。
更新日期:2020-06-23
down
wechat
bug