当前位置: X-MOL 学术IEEE Trans. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Safe Linear Thompson Sampling With Side Information
IEEE Transactions on Signal Processing ( IF 5.4 ) Pub Date : 2021-06-16 , DOI: 10.1109/tsp.2021.3089822
Ahmadreza Moradipari , Sanae Amani , Mahnoosh Alizadeh , Christos Thrampoulidis

The design and performance analysis of bandit algorithms in the presence of stage-wise safety or reliability constraints has recently garnered significant interest. In this work, we consider the linear stochastic bandit problem under additional unknown linear safety constraints that need to be satisfied at each round. For this problem, we present and analyze a new safe algorithm based on linear Thompson Sampling (TS). Our analysis shows that, with high probability, the algorithm chooses actions that are safe at each round and achieve cumulative regret of order $\mathcal {O} (d^{3/2}\log ^{1/2}d \cdot T^{1/2}\log ^{3/2}T)$ . Remarkably, this matches the regret bound provided by [1] , [2] for the standard linear TS algorithm in the absence of safety constraints. Also, our analysis highlights how the inherently randomized nature of the TS selection rule suffices to properly expand the set of safe actions that the algorithm has access to at each round. In particular, we compare this behavior to alternative safe algorithms, which typically require distinct rounds of randomization that are dedicated to learning the unknown constraints.

中文翻译:

带有辅助信息的安全线性 Thompson 采样

在存在阶段性安全或可靠性约束的情况下,bandit 算法的设计和性能分析最近引起了人们的极大兴趣。在这项工作中,我们考虑了在每一轮都需要满足的额外未知线性安全约束下的线性随机老虎机问题。针对这个问题,我们提出并分析了一种基于线性汤普森采样 (TS) 的新安全算法。我们的分析表明,该算法以高概率选择每轮安全的动作并实现订单累积后悔$\mathcal {O} (d^{3/2}\log ^{1/2}d \cdot T^{1/2}\log ^{3/2}T)$ . 值得注意的是,这与由[1] , [2]对于没有安全约束的标准线性 TS 算法。此外,我们的分析强调了 TS 选择规则固有的随机性如何足以适当地扩展算法在每一轮中可以访问的安全操作集。特别是,我们将此行为与替代安全算法进行​​了比较,后者通常需要专门用于学习未知约束的不同轮次的随机化。
更新日期:2021-07-30
down
wechat
bug