当前位置: X-MOL 学术Mach. Learn. › 论文详情
Gradient descent optimizes over-parameterized deep ReLU networks
Machine Learning ( IF 2.809 ) Pub Date : 2019-10-23 , DOI: 10.1007/s10994-019-05839-6
Difan Zou, Yuan Cao, Dongruo Zhou, Quanquan Gu

We study the problem of training deep fully connected neural networks with Rectified Linear Unit (ReLU) activation function and cross entropy loss function for binary classification using gradient descent. We show that with proper random weight initialization, gradient descent can find the global minima of the training loss for an over-parameterized deep ReLU network, under certain assumption on the training data. The key idea of our proof is that Gaussian random initialization followed by gradient descent produces a sequence of iterates that stay inside a small perturbation region centered at the initial weights, in which the training loss function of the deep ReLU networks enjoys nice local curvature properties that ensure the global convergence of gradient descent. At the core of our proof technique is (1) a milder assumption on the training data; (2) a sharp analysis of the trajectory length for gradient descent; and (3) a finer characterization of the size of the perturbation region. Compared with the concurrent work (Allen-Zhu et al. in A convergence theory for deep learning via over-parameterization, 2018a; Du et al. in Gradient descent finds global minima of deep neural networks, 2018a) along this line, our result relies on milder over-parameterization condition on the neural network width, and enjoys faster global convergence rate of gradient descent for training deep neural networks.
更新日期:2020-04-22

 

全部期刊列表>>
智控未来
聚焦商业经济政治法律
跟Nature、Science文章学绘图
控制与机器人
招募海内外科研人才,上自然官网
隐藏1h前已浏览文章
课题组网站
新版X-MOL期刊搜索和高级搜索功能介绍
ACS材料视界
x-mol收录
湖南大学化学化工学院刘松
上海有机所
廖良生
南方科技大学
西湖大学
伊利诺伊大学香槟分校
徐明华
中山大学化学工程与技术学院
试剂库存
天合科研
down
wechat
bug