当前位置: X-MOL 学术IEEE Trans. Netw. Sci. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Combinatorial Sleeping Bandits with Fairness Constraints
IEEE Transactions on Network Science and Engineering ( IF 6.7 ) Pub Date : 2020-07-01 , DOI: 10.1109/tnse.2019.2954310
Fengjiao Li , Jia Liu , Bo Ji

The multi-armed bandit (MAB) model has been widely adopted for studying many practical optimization problems (network resource allocation, ad placement, crowdsourcing, etc.) with unknown parameters. The goal of the player (i.e., the decision maker) here is to maximize the cumulative reward in the face of uncertainty. However, the basic MAB model neglects several important factors of the system in many real-world applications, where multiple arms (i.e., actions) can be simultaneously played and an arm could sometimes be “sleeping” (i.e., unavailable). Besides reward maximization, ensuring fairness is also a key design concern in practice. To that end, we propose a new Combinatorial Sleeping MAB model with Fairness constraints, called CSMAB-F, aiming to address the aforementioned crucial modeling issues. The objective is now to maximize the reward while satisfying the fairness requirement of a minimum selection fraction for each individual arm. To tackle this new problem, we extend an online learning algorithm, called Upper Confidence Bound (UCB), to deal with a critical tradeoff between exploitation and exploration and employ the virtual queue technique to properly handle the fairness constraints. By carefully integrating these two techniques, we develop a new algorithm, called Learning with Fairness Guarantee (LFG), for the CSMAB-F problem. Further, we rigorously prove that not only LFG is feasibility-optimal, but it also has a time-average regret upper bounded by $\frac{N}{2 \eta } + \frac{\beta _1 \sqrt{m N T \log {T}}+ \beta _2~N}{T}$, where $N$ is the total number of arms, $m$ is the maximum number of arms that can be simultaneously played, $T$ is the time horizon, $\beta _1$ and $\beta _2$ are constants, and $\eta$ is a design parameter that we can tune. Finally, we perform extensive simulations to corroborate the effectiveness of the proposed algorithm. Interestingly, the simulation results reveal an important tradeoff between the regret and the speed of convergence to a point satisfying the fairness constraints.

中文翻译:

具有公平约束的组合睡眠强盗

多臂老虎机(MAB)模型已被广泛用于研究许多未知参数的实际优化问题(网络资源分配、广告投放、众包等)。这里玩家(即决策者)的目标是在面临不确定性时最大化累积奖励。然而,在许多实际应用中,基本的 MAB 模型忽略了系统的几个重要因素,其中多个手臂(即动作)可以同时进行,而一个手臂有时可能处于“睡眠”状态(即,不可用)。除了奖励最大化,确保公平在实践中也是一个关键的设计问题。为此,我们提出了一个新的具有公平约束的组合睡眠 MAB 模型,称为 CSMAB-F,旨在解决上述关键的建模问题。现在的目标是最大化奖励,同时满足每个独立手臂的最小选择分数的公平性要求。为了解决这个新问题,我们扩展了一个在线学习算法,称为置信上限 (UCB),处理之间的关键权衡 开发勘探并采用虚拟队列技术正确处理公平性约束。通过仔细整合这两种技术,我们开发了一种新算法,称为公平保证学习 (LFG),对于 CSMAB-F 问题。此外,我们严格证明不仅 LFG 是可行性最优,但它也有一个时间平均 后悔 上限为 $\frac{N}{2 \eta } + \frac{\beta _1 \sqrt{m NT \log {T}}+ \beta _2~N}{T}$, 在哪里 $N$ 是武器的总数, 百万美元 是可以同时播放的最大武器数, $T$ 是时间范围, $\beta _1$$\beta _2$ 是常数,并且 $\eta$是我们可以调整的设计参数。最后,我们进行了广泛的模拟以证实所提出算法的有效性。有趣的是,模拟结果揭示了遗憾和收敛速度之间的重要权衡,以满足公平性约束。
更新日期:2020-07-01
down
wechat
bug