当前位置: X-MOL 学术arXiv.cs.MA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model
arXiv - CS - Multiagent Systems Pub Date : 2020-06-15 , DOI: arxiv-2006.08212
Rapha\"el Berthier (PSL, SIERRA), Francis Bach (SIERRA, PSL), Pierre Gaillard (SIERRA, PSL, Thoth)

In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y = \langle \theta_*, X \rangle$ between the random output $Y$ and the random feature vector $\Phi(U)$, a potentially non-linear transformation of the inputs $U$. We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square risk under this model. The convergence of the iterates to the optimum $\theta_*$ and the decay of the generalization error follow polynomial convergence rates with exponents that both depend on the regularities of the optimum $\theta_*$ and of the feature vectors $\Phi(u)$. We interpret our result in the reproducing kernel Hilbert space framework. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points; the convergence depends on the Sobolev smoothness of the function and of a chosen kernel. Finally, we apply our analysis beyond the supervised learning setting to obtain convergence rates for the averaging process (a.k.a. gossip algorithm) on a graph depending on its spectral dimension.

中文翻译:

无噪声线性模型下随机梯度下降的严格非参数收敛率

在统计监督学习的背景下,无噪声线性模型假设随机输出 $Y$ 和随机特征向量 $\Phi(U) 之间存在确定性线性关系 $Y = \langle \theta_*, X \rangle$ )$,输入 $U$ 的潜在非线性变换。我们分析了单通道、固定步长随机梯度下降在该模型下最小二乘风险的收敛性。迭代收敛到最优值 $\theta_*$ 和泛化误差的衰减遵循多项式收敛速率,其指数均取决于最优值 $\theta_*$ 和特征向量 $\Phi(u )$。我们在再生核希尔伯特空间框架中解释我们的结果。作为特例,我们分析了一种在线算法,用于从随机采样点的值的无噪声观察中估计单位区间上的实函数;收敛取决于函数和所选内核的 Sobolev 平滑度。最后,我们将我们的分析应用于监督学习设置之外,以根据图的光谱维度获得图上的平均过程(又名八卦算法)的收敛率。
更新日期:2020-10-28
down
wechat
bug