当前位置: X-MOL 学术IEEE Trans. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning Two-Layer ReLU Networks Is Nearly as Easy as Learning Linear Classifiers on Separable Data
IEEE Transactions on Signal Processing ( IF 4.6 ) Pub Date : 2021-07-07 , DOI: 10.1109/tsp.2021.3094911
Qiuling Yang , Alireza Sadeghi , Gang Wang , Jian Sun

Neural networks with non-linear rectified linear unit (ReLU) activation functions have demonstrated remarkable performance in many fields. It has been observed that a sufficiently wide and/or deep ReLU network can accurately fit the training data, with a small generalization error on the testing data. Nevertheless, existing analytical results on provably training ReLU networks are mostly limited to over-parameterized cases, or they require assumptions on the data distribution. In this paper, training a two-layer ReLU network for binary classification of linearly separable data is revisited. Adopting the hinge loss as classification criterion yields a non-convex objective function with infinite local minima and saddle points. Instead, a modified loss is proposed which enables (stochastic) gradient descent to attain a globally optimal solution. Enticingly, the solution found is globally optimal for the hinge loss too. In addition, an upper bound on the number of iterations required to find a global minimum is derived. To ensure generalization performance, a convex max-margin formulation for two-layer ReLU network classifiers is discussed. Connections between the sought max-margin ReLU network and the max-margin support vector machine are drawn. Finally, an algorithm-dependent theoretical quantification of the generalization performance is developed using classical compression bounds. Numerical tests using synthetic and real data validate the analytical results.

中文翻译:


学习两层 ReLU 网络几乎与学习可分离数据的线性分类器一样简单



具有非线性修正线性单元(ReLU)激活函数的神经网络在许多领域都表现出了卓越的性能。据观察,足够宽和/或深的 ReLU 网络可以准确地拟合训练数据,并且测试数据上的泛化误差很小。然而,现有的可证明训练 ReLU 网络的分析结果大多局限于过度参数化的情况,或者需要对数据分布进行假设。在本文中,重新审视了训练用于线性可分离数据二元分类的两层 ReLU 网络。采用铰链损失作为分类标准产生具有无限局部最小值和鞍点的非凸目标函数。相反,提出了一种修改后的损失,使(随机)梯度下降能够获得全局最优解。有趣的是,所找到的解决方案对于铰链损耗来说也是全局最优的。此外,还导出了找到全局最小值所需的迭代次数的上限。为了确保泛化性能,讨论了两层 ReLU 网络分类器的凸最大间隔公式。绘制了所寻求的最大边缘 ReLU 网络和最大边缘支持向量机之间的连接。最后,使用经典压缩界限开发了依赖于算法的泛化性能理论量化。使用合成数据和真实数据的数值测试验证了分析结果。
更新日期:2021-07-07
down
wechat
bug