当前位置: X-MOL 学术arXiv.cs.IT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On Information Gain and Regret Bounds in Gaussian Process Bandits
arXiv - CS - Information Theory Pub Date : 2020-09-15 , DOI: arxiv-2009.06966
Sattar Vakili, Kia Khezeli, Victor Picheny

Consider the sequential optimization of an expensive to evaluate and possibly non-convex objective function $f$ from noisy feedback, that can be considered as a continuum-armed bandit problem. Upper bounds on the regret performance of several learning algorithms (GP-UCB, GP-TS, and their variants) are known under both a Bayesian (when $f$ is a sample from a Gaussian process (GP)) and a frequentist (when $f$ lives in a reproducing kernel Hilbert space) setting. The regret bounds often rely on the maximal information gain $\gamma_T$ between $T$ observations and the underlying GP (surrogate) model. We provide general bounds on $\gamma_T$ based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels, improves the existing bounds on $\gamma_T$, and consequently the regret bounds relying on $\gamma_T$ under numerous settings. For the Mat\'ern family of kernels, where the lower bounds on $\gamma_T$, and regret under the frequentist setting, are known, our results close a huge polynomial in $T$ gap between the upper and lower bounds (up to logarithmic in $T$ factors).

中文翻译:

关于高斯过程强盗的信息增益和后悔边界

考虑来自噪声反馈的昂贵评估和可能非凸目标函数 $f$ 的顺序优化,可以将其视为连续武装老虎机问题。几种学习算法(GP-UCB、GP-TS 及其变体)的遗憾性能的上限在贝叶斯(当 $f$ 是来自高斯过程(GP)的样本时)和频率论(当$f$ 存在于复制内核希尔伯特空间)设置中。遗憾边界通常依赖于 $T$ 观测值和基础 GP(代理)模型之间的最大信息增益 $\gamma_T$。我们根据 GP 内核特征值的衰减率提供了 $\gamma_T$ 的一般边界,GP 内核专门针对常用内核,改进了 $\gamma_T$ 的现有边界,因此,在众多设置下,遗憾边界依赖于 $\gamma_T$。对于 Mat\'ern 内核系列,其中 $\gamma_T$ 的下限和频率论设置下的遗憾是已知的,我们的结果在上限和下限之间的 $T$ 差距中关闭了一个巨大的多项式(高达$T$ 因子中的对数)。
更新日期:2020-10-12
down
wechat
bug