当前位置: X-MOL 学术SIAM J. Optim. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MultiComposite Nonconvex Optimization for Training Deep Neural Networks
SIAM Journal on Optimization ( IF 2.6 ) Pub Date : 2020-06-18 , DOI: 10.1137/18m1231559
Ying Cui , Ziyu He , Jong-Shi Pang

SIAM Journal on Optimization, Volume 30, Issue 2, Page 1693-1723, January 2020.
We present in this paper a novel deterministic algorithmic framework that enables the computation of a directional stationary solution of the empirical deep neural network training problem formulated as a multicomposite optimization problem with coupled nonconvexity and nondifferentiability. This is the first time to our knowledge that such a sharp kind of stationary solution is provably computable for a nonsmooth deep neural network. Allowing for arbitrary finite numbers of input samples and training layers, an arbitrary number of neurons within each layer, and arbitrary piecewise activation functions, the proposed approach combines the methods of exact penalization, majorization-minimization, gradient projection with enhancements, and the dual semismooth Newton method, each for a particular purpose in an overall computational scheme. While a routine implementation of the semismooth Newton method would be computationally expensive, we show that careful linear algebraic implementation helps to greatly reduce the computational and storage costs for problems of arbitrary dimensions. Contrary to existing stochastic approaches which provide at best very weak guarantees on the computed solutions obtained in practical implementation, our rigorous deterministic treatment provides guarantee of the stationarity properties of the computed solutions with reference to the optimization problems being solved. Numerical results from a MATLAB implementation demonstrate the effectiveness of the framework for solving reasonably sized networks with a modest number of training samples (in the low thousands).


中文翻译:

用于训练深层神经网络的多复合非凸优化

SIAM优化杂志,第30卷,第2期,第1693-1723页,2020年1月。
我们在本文中提出了一种新颖的确定性算法框架,该框架使计算经验深层神经网络训练问题的方向平稳解成为具有非凸性和不可微性的多复合优化问题。据我们所知,这是第一次,对于非光滑的深度神经网络,这种稳定的解决方案是可证明的。考虑到任意有限数量的输入样本和训练层,每层中任意数量的神经元以及任意分段激活函数,该方法结合了精确罚分,主化最小化,带有增强的梯度投影和双重半平滑的方法牛顿法,每种方法在总体计算方案中均用于特定目的。虽然半光滑的牛顿法的常规实现在计算上会很昂贵,但我们证明了谨慎的线性代数实现有助于大大降低任意尺寸问题的计算和存储成本。与现有的随机方法相反,现有的随机方法最多只能为实际实现中获得的计算解决方案提供非常微弱的保证,而我们严格的确定性处理则可以参考要解决的优化问题来保证计算解决方案的平稳性。MATLAB实现的数值结果证明了该框架在求解训练样本数量适中(低至数千个)的合理规模网络中的有效性。我们表明,谨慎的线性代数实现有助于极大地减少任意尺寸问题的计算和存储成本。与现有的随机方法相反,现有的随机方法最多只能为实际实现中获得的计算解决方案提供非常微弱的保证,而我们严格的确定性处理则可以参考要解决的优化问题来保证计算解决方案的平稳性。MATLAB实现的数值结果证明了该框架在求解训练样本数量适中(低至数千个)的合理规模网络中的有效性。我们表明,谨慎的线性代数实现有助于极大地减少任意尺寸问题的计算和存储成本。与现有的随机方法相反,现有的随机方法最多只能为实际实现中获得的计算解决方案提供非常微弱的保证,而我们严格的确定性处理则可以参考要解决的优化问题来保证计算解决方案的平稳性。MATLAB实现的数值结果证明了该框架在求解训练样本数量适中(低至数千个)的合理规模网络中的有效性。与现有的随机方法相反,现有的随机方法最多只能为实际实现中获得的计算解决方案提供非常微弱的保证,而我们严格的确定性处理则可以参考要解决的优化问题来保证计算解决方案的平稳性。MATLAB实现的数值结果证明了该框架在求解训练样本数量适中(低至数千个)的合理规模网络时的有效性。与现有的随机方法相反,现有的随机方法最多只能为实际实现中获得的计算解决方案提供非常微弱的保证,而我们严格的确定性处理则可以参考要解决的优化问题来保证计算解决方案的平稳性。MATLAB实现的数值结果证明了该框架在求解训练样本数量适中(低至数千个)的合理规模网络中的有效性。
更新日期:2020-07-23
down
wechat
bug