当前位置: X-MOL 学术Stat. Probab. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards
Statistics & Probability Letters ( IF 0.9 ) Pub Date : 2020-09-01 , DOI: 10.1016/j.spl.2020.108818
Sakshi Arya , Yuhong Yang

We study a multi-armed bandit problem with covariates in a setting where there is a possible delay in observing the rewards. Under some mild assumptions on the probability distributions for the delays and using an appropriate randomization to select the arms, the proposed strategy is shown to be strongly consistent.

中文翻译:

具有延迟奖励的上下文多臂老虎机的非参数估计随机分配

我们在观察奖励可能存在延迟的环境中研究了一个带有协变量的多臂老虎机问题。在延迟概率分布的一些温和假设下,并使用适当的随机化来选择武器,所提出的策略被证明是非常一致的。
更新日期:2020-09-01
down
wechat
bug