Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model,arXiv - CS - Multiagent Systems

当前位置： X-MOL 学术 › arXiv.cs.MA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model
arXiv - CS - Multiagent Systems Pub Date : 2020-06-15 , DOI: arxiv-2006.08212
Rapha\"el Berthier (PSL, SIERRA), Francis Bach (SIERRA, PSL), Pierre Gaillard (SIERRA, PSL, Thoth)

In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y = \langle \theta_*, X \rangle$ between the random output $Y$ and the random feature vector $\Phi(U)$, a potentially non-linear transformation of the inputs $U$. We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square risk under this model. The convergence of the iterates to the optimum $\theta_*$ and the decay of the generalization error follow polynomial convergence rates with exponents that both depend on the regularities of the optimum $\theta_*$ and of the feature vectors $\Phi(u)$. We interpret our result in the reproducing kernel Hilbert space framework. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points; the convergence depends on the Sobolev smoothness of the function and of a chosen kernel. Finally, we apply our analysis beyond the supervised learning setting to obtain convergence rates for the averaging process (a.k.a. gossip algorithm) on a graph depending on its spectral dimension.

中文翻译：

无噪声线性模型下随机梯度下降的严格非参数收敛率

在统计监督学习的背景下，无噪声线性模型假设随机输出 $Y$ 和随机特征向量 $\Phi(U) 之间存在确定性线性关系 $Y = \langle \theta_*, X \rangle$ )$，输入 $U$ 的潜在非线性变换。我们分析了单通道、固定步长随机梯度下降在该模型下最小二乘风险的收敛性。迭代收敛到最优值 $\theta_*$ 和泛化误差的衰减遵循多项式收敛速率，其指数均取决于最优值 $\theta_*$ 和特征向量 $\Phi(u )$。我们在再生核希尔伯特空间框架中解释我们的结果。作为特例，我们分析了一种在线算法，用于从随机采样点的值的无噪声观察中估计单位区间上的实函数；收敛取决于函数和所选内核的 Sobolev 平滑度。最后，我们将我们的分析应用于监督学习设置之外，以根据图的光谱维度获得图上的平均过程（又名八卦算法）的收敛率。

更新日期：2020-10-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文