当前位置: X-MOL 学术Math. Meth. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Boundary Crossing Probabilities for General Exponential Families
Mathematical Methods of Statistics Pub Date : 2018-05-11 , DOI: 10.3103/s1066530718010015
O.-A. Maillard

We consider parametric exponential families of dimension K on the real line. We study a variant of boundary crossing probabilities coming from the multi-armed bandit literature, in the case when the real-valued distributions form an exponential family of dimension K. Formally, our result is a concentration inequality that bounds the probability that Bψ(θ̂ n , θ*) ≥ f(t/n)/n, where θ* is the parameter of an unknown target distribution, θ̂ n is the empirical parameter estimate built from n observations, ψ is the log-partition function of the exponential family and Bψ is the corresponding Bregman divergence. From the perspective of stochastic multi-armed bandits, we pay special attention to the case when the boundary function f is logarithmic, as it is enables to analyze the regret of the state-of-the-art KL-ucb and KL-ucb+ strategies, whose analysis was left open in such generality. Indeed, previous results only hold for the case when K = 1, while we provide results for arbitrary finite dimension K, thus considerably extending the existing results. Perhaps surprisingly, we highlight that the proof techniques to achieve these strong results already existed three decades ago in the work of T. L. Lai, and were apparently forgotten in the bandit community. We provide a modern rewriting of these beautiful techniques that we believe are useful beyond the application to stochastic multi-armed bandits.

中文翻译:

一般指数族的边界穿越概率

我们考虑实线上尺寸为K的参数指数族。当实值分布形成维数K的指数族时,我们研究了多臂匪徒文献中的边界穿越概率变体。形式上,我们的结果是界定的概率是一个集中不等式ψθ Ñθ *)≥ ˚F/ Ñ)/ Ñ,其中θ *是一个未知的目标分布的参数,θ Ñ是经验参数估计从 Ñ观察,ψ是指数家族的日志分区的功能和ψ是对应的发散布雷格曼。从随机多臂匪徒的角度来看,我们特别注意边界函数f为对数的情况,因为它可以分析最新KL-ucb和KL-ucb +策略的遗憾,其分析如此普遍。确实,先前的结果仅适用于K = 1的情况,而我们提供了任意有限维K的结果,从而大大扩展了现有结果。也许令人惊讶的是,我们强调了达到这些强大结果的证明技术已经在三十年前的TL Lai的工作中存在,并且显然在土匪社区中被遗忘了。我们对这些精美的技术进行了现代重写,我们认为这些技术不仅适用于随机多臂土匪。
更新日期:2018-05-11
down
wechat
bug