When will gradient methods converge to max‐margin classifier under ReLU models?,Stat

当前位置： X-MOL 学术 › Stat › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

When will gradient methods converge to max‐margin classifier under ReLU models?
Stat ( IF 0.7 ) Pub Date : 2020-12-31 , DOI: 10.1002/sta4.354
Tengyu Xu _{1,

2} , Yi Zhou ₃ , Kaiyi Ji ₁ , Yingbin Liang ₁

Affiliation

We study the implicit bias of gradient descent methods in solving a binary classification problem over a linearly separable data set. The classifier is described by a non‐linear ReLU model and the objective function adopts the exponential loss function. We first characterize the landscape of the loss function and show that there can exist spurious asymptotic local minima besides asymptotic global minima. We then show that gradient descent (GD) can converge to either a global or a local max‐margin direction or may diverge from the desired max‐margin direction in a general context. For stochastic gradient descent (SGD), we show that it converges in expectation to either the global or the local max‐margin direction if SGD converges. We further explore the implicit bias of these algorithms in learning a multineuron network under certain stationary conditions and show that the learned classifier maximizes the margins of each sample pattern partition under the ReLU activation.

中文翻译：

梯度方法何时会收敛到ReLU模型下的最大余量分类器？

我们研究了线性可分离数据集上的二元分类问题的梯度下降方法的隐式偏差。分类器由非线性ReLU模型描述，目标函数采用指数损失函数。我们首先对损失函数的特征进行刻画，并表明除了渐近全局极小值外，还可能存在伪渐近局部极小值。然后，我们表明，梯度下降（GD）可以收敛到全局或局部最大余量方向，或者在一般情况下可以偏离所需的最大余量方向。对于随机梯度下降（SGD），我们表明，如果SGD收敛，则期望收敛于全局或局部最大余量方向。

更新日期：2021-03-10

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文