当前位置:
X-MOL 学术
›
arXiv.cs.LG
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Forced-exploration free Strategies for Unimodal Bandits
arXiv - CS - Machine Learning Pub Date : 2020-06-30 , DOI: arxiv-2006.16569 Hassan Saber (SEQUEL), Pierre M\'enard (SEQUEL), Odalric-Ambrym Maillard (SEQUEL)
arXiv - CS - Machine Learning Pub Date : 2020-06-30 , DOI: arxiv-2006.16569 Hassan Saber (SEQUEL), Pierre M\'enard (SEQUEL), Odalric-Ambrym Maillard (SEQUEL)
We consider a multi-armed bandit problem specified by a set of Gaussian or
Bernoulli distributions endowed with a unimodal structure. Although this
problem has been addressed in the literature (Combes and Proutiere, 2014), the
state-of-the-art algorithms for such structure make appear a forced-exploration
mechanism. We introduce IMED-UB, the first forced-exploration free strategy
that exploits the unimodal-structure, by adapting to this setting the Indexed
Minimum Empirical Divergence (IMED) strategy introduced by Honda and Takemura
(2015). This strategy is proven optimal. We then derive KLUCB-UB, a KLUCB
version of IMED-UB, which is also proven optimal. Owing to our proof technique,
we are further able to provide a concise finite-time analysis of both
strategies in an unified way. Numerical experiments show that both IMED-UB and
KLUCB-UB perform similarly in practice and outperform the state-of-the-art
algorithms.
中文翻译:
单峰强盗的强制探索免费策略
我们考虑由一组具有单峰结构的高斯或伯努利分布指定的多臂老虎机问题。虽然这个问题已经在文献中得到解决(Combes 和 Proutiere,2014),但这种结构的最先进算法使出现了一种强制探索机制。我们引入了 IMED-UB,这是第一个利用单峰结构的强制探索自由策略,通过适应这种设置,本田和武村(2015)引入的索引最小经验发散(IMED)策略。该策略被证明是最佳的。然后我们推导出 KLUCB-UB,它是 IMED-UB 的 KLUCB 版本,它也被证明是最佳的。由于我们的证明技术,我们能够进一步以统一的方式对两种策略进行简明的有限时间分析。
更新日期:2020-07-01
中文翻译:
单峰强盗的强制探索免费策略
我们考虑由一组具有单峰结构的高斯或伯努利分布指定的多臂老虎机问题。虽然这个问题已经在文献中得到解决(Combes 和 Proutiere,2014),但这种结构的最先进算法使出现了一种强制探索机制。我们引入了 IMED-UB,这是第一个利用单峰结构的强制探索自由策略,通过适应这种设置,本田和武村(2015)引入的索引最小经验发散(IMED)策略。该策略被证明是最佳的。然后我们推导出 KLUCB-UB,它是 IMED-UB 的 KLUCB 版本,它也被证明是最佳的。由于我们的证明技术,我们能够进一步以统一的方式对两种策略进行简明的有限时间分析。