Learning Unknown Service Rates in Queues: A Multiarmed Bandit Approach,Operations Research

当前位置： X-MOL 学术 › Operations Research › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning Unknown Service Rates in Queues: A Multiarmed Bandit Approach
Operations Research ( IF 2.2 ) Pub Date : 2021-01-01 , DOI: 10.1287/opre.2020.1995
Subhashini Krishnasamy ₁ , Rajat Sen ₁ , Ramesh Johari ₂ , Sanjay Shakkottai ₁

Affiliation

Consider a queueing system consisting of multiple servers. Jobs arrive over time and enter a queue for service; the goal is to minimize the size of this queue. At each opportunity for service, at most one server can be chosen, and at most one job can be served. Service is successful with a probability (the service probability) that is a priori unknown for each server. An algorithm that knows the service probabilities (the "genie") can always choose the server of highest service probability. We study algorithms that learn the unknown service probabilities. Our goal is to minimize queue-regret: the (expected) difference between the queue-lengths obtained by the algorithm, and those obtained by the "genie." Since queue-regret cannot be larger than classical regret, results for the standard multi-armed bandit problem give algorithms for which queue-regret increases no more than logarithmically in time. Our paper shows surprisingly more complex behavior. In particular, as long as the bandit algorithm's queues have relatively long regenerative cycles, queue-regret is similar to cumulative regret, and scales (essentially) logarithmically. However, we show that this "early stage" of the queueing bandit eventually gives way to a "late stage", where the optimal queue-regret scaling is $O(1/t)$. We demonstrate an algorithm that (order-wise) achieves this asymptotic queue-regret in the late stage. Our results are developed in a more general model that allows for multiple job classes as well.

中文翻译：

学习队列中未知的服务费率：一种多武装的强盗方法

考虑由多个服务器组成的排队系统。作业随时间到达，并进入服务队列；目标是最小化此队列的大小。在每次提供服务的机会中，最多可以选择一台服务器，并且最多可以提供一份工作。服务成功的概率（服务概率）对于每个服务器都是先验未知的。知道服务概率的算法（“精灵”）始终可以选择服务概率最高的服务器。我们研究学习未知服务概率的算法。我们的目标是最大程度地减少队列后悔：算法获得的队列长度与“精灵”获得的队列长度之间的（预期）差异。由于遗憾的队列不能超过经典的遗憾，标准多臂强盗问题的结果给出了一些算法，对于这些算法，队列后悔在时间上的增加不超过对数。我们的论文显示出令人惊讶的更复杂的行为。特别是，只要强盗算法的队列具有相对较长的再生周期，队列后悔就类似于累积后悔，并且（基本上）对数扩展。但是，我们表明，这种排队匪徒的“早期阶段”最终让位于了“后期阶段”，其中最佳的队列后悔缩放比例为$ O（1 / t）$。我们展示了一种算法（按顺序），可以在后期实现此渐近队列后悔。我们的结果是在更通用的模型中开发的，该模型也允许多个职位类别。我们的论文显示出令人惊讶的更复杂的行为。特别是，只要强盗算法的队列具有相对较长的再生周期，队列后悔就类似于累积后悔，并且（基本上）对数扩展。但是，我们表明，这种排队匪徒的“早期阶段”最终让位于了“后期阶段”，其中最佳的队列后悔缩放比例为$ O（1 / t）$。我们展示了一种算法（按顺序），可以在后期实现此渐近队列后悔。我们的结果是在更通用的模型中开发的，该模型也允许多个职位类别。我们的论文显示出令人惊讶的更复杂的行为。特别是，只要强盗算法的队列具有相对较长的再生周期，队列后悔就类似于累积后悔，并且（基本上）对数扩展。但是，我们表明，这种排队匪徒的“早期阶段”最终让位于了“后期阶段”，其中最佳的队列后悔缩放比例为$ O（1 / t）$。我们展示了一种算法（按顺序），可以在后期实现此渐近队列后悔。我们的结果是在更通用的模型中开发的，该模型也允许多个职位类别。

更新日期：2021-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文