当前位置: X-MOL 学术arXiv.cs.AI › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Self-Concordant Analysis of Generalized Linear Bandits with Forgetting
arXiv - CS - Artificial Intelligence Pub Date : 2020-11-02 , DOI: arxiv-2011.00819
Yoan RussacDI-ENS, CNRS, PSL, VALDA, Louis FauryDI-ENS, VALDA, Olivier CappéDI-ENS, VALDA, Aurélien GarivierUMPA-ENSL

Contextual sequential decision problems with categorical or numerical observations are ubiquitous and Generalized Linear Bandits (GLB) offer a solid theoretical framework to address them. In contrast to the case of linear bandits, existing algorithms for GLB have two drawbacks undermining their applicability. First, they rely on excessively pessimistic concentration bounds due to the non-linear nature of the model. Second, they require either non-convex projection steps or burn-in phases to enforce boundedness of the estimators. Both of these issues are worsened when considering non-stationary models, in which the GLB parameter may vary with time. In this work, we focus on self-concordant GLB (which include logistic and Poisson regression) with forgetting achieved either by the use of a sliding window or exponential weights. We propose a novel confidence-based algorithm for the maximum-likehood estimator with forgetting and analyze its perfomance in abruptly changing environments. These results as well as the accompanying numerical simulations highlight the potential of the proposed approach to address non-stationarity in GLB.

中文翻译:

遗忘的广义线性强盗的自洽分析

具有分类或数值观测的上下文顺序决策问题无处不在,而广义线性强盗(GLB)提供了解决这些问题的坚实理论框架。与线性强盗的情况相反,现有的GLB算法有两个缺点,削弱了它们的适用性。首先,由于模型的非线性性质,他们依赖过度悲观的集中范围。其次,它们需要非凸投影步骤或预热阶段以强制估计量的有界性。当考虑非平稳模型时,这两个问题都会恶化,在该模型中,GLB参数可能会随时间变化。在这项工作中,我们专注于自洽GLB(包括logistic和Poisson回归),而忘记是通过使用滑动窗口或指数权重实现的。我们为最大似然估计器提出了一种基于置信度的新颖算法,该算法具有遗忘功能,并分析了其在突然变化的环境中的性能。这些结果以及随附的数值模拟凸显了所提出的方法在解决GLB中的非平稳性方面的潜力。
更新日期:2020-11-02
down
wechat
bug