Thompson Sampling with a Mixture Prior,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Thompson Sampling with a Mixture Prior
arXiv - CS - Machine Learning Pub Date : 2021-06-10 , DOI: arxiv-2106.05608
Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh, Craig Boutilier

We study Thompson sampling (TS) in online decision-making problems where the uncertain environment is sampled from a mixture distribution. This is relevant to multi-task settings, where a learning agent is faced with different classes of problems. We incorporate this structure in a natural way by initializing TS with a mixture prior -- dubbed MixTS -- and develop a novel, general technique for analyzing the regret of TS with such priors. We apply this technique to derive Bayes regret bounds for MixTS in both linear bandits and tabular Markov decision processes (MDPs). Our regret bounds reflect the structure of the problem and depend on the number of components and confidence width of each component of the prior. Finally, we demonstrate the empirical effectiveness of MixTS in both synthetic and real-world experiments.

中文翻译：

具有混合先验的 Thompson 抽样

我们研究了在线决策问题中的汤普森采样 (TS)，其中不确定环境是从混合分布中采样的。这与多任务设置相关，其中学习代理面临不同类别的问题。我们通过使用混合先验（称为 MixTS）初始化 TS 以自然的方式合并此结构，并开发了一种新颖的通用技术来分析具有此类先验的 TS 的遗憾。我们应用这种技术在线性老虎机和表格马尔可夫决策过程 (MDP) 中推导出 MixTS 的贝叶斯后悔界限。我们的遗憾界限反映了问题的结构，并取决于先验的组件数量和每个组件的置信度宽度。最后，我们证明了 MixTS 在合成和现实世界实验中的经验有效性。

更新日期：2021-06-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>