Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards,Statistics & Probability Letters

当前位置： X-MOL 学术 › Stat. Probab. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards
Statistics & Probability Letters ( IF 0.9 ) Pub Date : 2020-09-01 , DOI: 10.1016/j.spl.2020.108818
Sakshi Arya , Yuhong Yang

We study a multi-armed bandit problem with covariates in a setting where there is a possible delay in observing the rewards. Under some mild assumptions on the probability distributions for the delays and using an appropriate randomization to select the arms, the proposed strategy is shown to be strongly consistent.

中文翻译：