当前位置: X-MOL 学术arXiv.cs.DC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimal Rates of Distributed Regression with Imperfect Kernels
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2020-06-30 , DOI: arxiv-2006.16744
Hongwei Sun (University of Jinan) and Qiang Wu (Middle Tennessee State University)

Distributed machine learning systems have been receiving increasing attentions for their efficiency to process large scale data. Many distributed frameworks have been proposed for different machine learning tasks. In this paper, we study the distributed kernel regression via the divide and conquer approach. This approach has been proved asymptotically minimax optimal if the kernel is perfectly selected so that the true regression function lies in the associated reproducing kernel Hilbert space. However, this is usually, if not always, impractical because kernels that can only be selected via prior knowledge or a tuning process are hardly perfect. Instead it is more common that the kernel is good enough but imperfect in the sense that the true regression can be well approximated by but does not lie exactly in the kernel space. We show distributed kernel regression can still achieves capacity independent optimal rate in this case. To this end, we first establish a general framework that allows to analyze distributed regression with response weighted base algorithms by bounding the error of such algorithms on a single data set, provided that the error bounds has factored the impact of the unexplained variance of the response variable. Then we perform a leave one out analysis of the kernel ridge regression and bias corrected kernel ridge regression, which in combination with the aforementioned framework allows us to derive sharp error bounds and capacity independent optimal rates for the associated distributed kernel regression algorithms. As a byproduct of the thorough analysis, we also prove the kernel ridge regression can achieve rates faster than $N^{-1}$ (where $N$ is the sample size) in the noise free setting which, to our best knowledge, are first observed and novel in regression learning.

中文翻译:

具有不完美内核的分布式回归的最优速率

分布式机器学习系统因其处理大规模数据的效率而受到越来越多的关注。已经为不同的机器学习任务提出了许多分布式框架。在本文中,我们通过分治法研究分布式核回归。如果完美地选择了内核,那么这种方法已被证明是渐近极小极大最优的,因此真正的回归函数位于相关的再现内核 Hilbert 空间中。然而,这通常(如果不是总是)不切实际,因为只能通过先验知识或调整过程选择的内核很难做到完美。相反,更常见的是内核足够好但不完美,因为真正的回归可以很好地近似于但不完全位于内核空间中。我们表明,在这种情况下,分布式核回归仍然可以实现与容量无关的最佳速率。为此,我们首先建立了一个通用框架,允许通过将此类算法的误差限制在单个数据集上来分析具有响应加权基础算法的分布式回归,前提是误差范围已将无法解释的响应方差的影响考虑在内多变的。然后,我们对核岭回归和偏置校正核岭回归进行留一分析,结合上述框架,我们可以为相关的分布式核回归算法推导出清晰的误差界限和与容量无关的最佳率。作为彻底分析的副产品,
更新日期:2020-07-01
down
wechat
bug