A non-parametric solution to the multi-armed bandit problem with covariates,Journal of Statistical Planning and Inference

当前位置： X-MOL 学术 › J. Stat. Plann. Inference › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A non-parametric solution to the multi-armed bandit problem with covariates
Journal of Statistical Planning and Inference ( IF 0.8 ) Pub Date : 2021-03-01 , DOI: 10.1016/j.jspi.2020.07.008
Mingyao Ai , Yimin Huang , Jun Yu

Abstract In recent years, the multi-armed bandit problem regains popularity especially for the case with covariates since it has new applications in customized services such as personalized medicine. To deal with the bandit problem with covariates, a policy called binned subsample mean comparison that decomposes the original problem into some proper classic bandit problems is introduced. The growth rate in a setting that the reward of each arm depends on observable covariates is studied accordingly. When rewards follow an exponential family, it can be shown that the regret of the proposed method can achieve the nearly optimal growth rate. Simulations show that the proposed policy has the competitive performance compared with other policies.

中文翻译：

具有协变量的多臂老虎机问题的非参数解

摘要近年来，多臂老虎机问题重新流行起来，尤其是在有协变量的情况下，因为它在个性化医疗等定制服务中有新的应用。为了处理协变量的老虎机问题，引入了一种称为分箱子样本均值比较的策略，该策略将原始问题分解为一些适当的经典老虎机问题。相应地研究了每个臂的奖励取决于可观察协变量的情况下的增长率。当奖励遵循指数族时，可以证明所提出方法的遗憾可以达到近乎最优的增长率。模拟表明，与其他政策相比，所提出的政策具有竞争性能。

更新日期：2021-03-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11