当前位置: X-MOL 学术Discret. Comput. Geom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Near-Optimal Coresets of Kernel Density Estimates
Discrete & Computational Geometry ( IF 0.6 ) Pub Date : 2019-09-25 , DOI: 10.1007/s00454-019-00134-6
Jeff M. Phillips , Wai Ming Tai

We construct near-optimal coresets for kernel density estimates for points in $${\mathbb {R}}^d$$ R d when the kernel is positive definite. Specifically we provide a polynomial time construction for a coreset of size $$O(\sqrt{d}/\varepsilon \cdot \sqrt{\log 1/\varepsilon } )$$ O ( d / ε · log 1 / ε ) , and we show a near-matching lower bound of size $$\Omega (\min \{\sqrt{d}/\varepsilon , 1/\varepsilon ^2\})$$ Ω ( min { d / ε , 1 / ε 2 } ) . When $$d\ge 1/\varepsilon ^2$$ d ≥ 1 / ε 2 , it is known that the size of coreset can be $$O(1/\varepsilon ^2)$$ O ( 1 / ε 2 ) . The upper bound is a polynomial-in- $$(1/\varepsilon )$$ ( 1 / ε ) improvement when $$d \in [3,1/\varepsilon ^2)$$ d ∈ [ 3 , 1 / ε 2 ) and the lower bound is the first known lower bound to depend on d for this problem. Moreover, the upper bound restriction that the kernel is positive definite is significant in that it applies to a wide variety of kernels, specifically those most important for machine learning. This includes kernels for information distances and the sinc kernel which can be negative.

中文翻译:

核密度估计的近最优核心集

当内核为正定时,我们为 $${\mathbb {R}}^d$$ R d 中的点构建内核密度估计的近乎最优的核心集。具体来说,我们为大小为 $$O(\sqrt{d}/\varepsilon \cdot \sqrt{\log 1/\varepsilon } )$$ O ( d / ε · log 1 / ε ) 的核心集提供多项式时间构造,我们展示了一个近似匹配的下界 $$\Omega (\min \{\sqrt{d}/\varepsilon , 1/\varepsilon ^2\})$$ Ω ( min { d / ε , 1 /ε2})。当 $$d\ge 1/\varepsilon ^2$$ d ≥ 1 / ε 2 时,可知核心集的大小可以为 $$O(1/\varepsilon ^2)$$ O ( 1 / ε 2 )。当 $$d \in [3,1/\varepsilon ^2)$$ d ∈ [ 3 , 1 / ε 2 ) 并且下界是第一个已知的下界,该下界取决于此问题的 d。而且,内核为正定的上限限制很重要,因为它适用于各种内核,特别是那些对机器学习最重要的内核。这包括信息距离的内核和可能为负的 sinc 内核。
更新日期:2019-09-25
down
wechat
bug