Block layer decomposition schemes for training deep neural networks

Palagi, Laura; Seccia, Ruggiero

doi:10.1007/s10898-019-00856-0

Block layer decomposition schemes for training deep neural networks

Published: 15 November 2019

Volume 77, pages 97–124, (2020)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

381 Accesses
2 Citations
2 Altmetric
Explore all metrics

Abstract

Deep feedforward neural networks’ (DFNNs) weight estimation relies on the solution of a very large nonconvex optimization problem that may have many local (no global) minimizers, saddle points and large plateaus. Furthermore, the time needed to find good solutions of the training problem heavily depends on both the number of samples and the number of weights (variables). In this work, we show how block coordinate descent (BCD) methods can be fruitful applied to DFNN weight optimization problem and embedded in online frameworks possibly avoiding bad stationary points. We first describe a batch BCD method able to effectively tackle difficulties due to the network’s depth; then we further extend the algorithm proposing an online BCD scheme able to scale with respect to both the number of variables and the number of samples. We perform extensive numerical results on standard datasets using various deep networks. We show that the application of BCD methods to the training problem of DFNNs improves over standard batch/online algorithms in the training phase guaranteeing good generalization performance as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Bolstering stochastic gradient descent with model building

Article Open access 15 April 2024

Automated machine learning: past, present and future

Article Open access 18 April 2024

References

Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013)
Article MathSciNet Google Scholar
Bertsekas, D.P.: Incremental least squares methods and the extended Kalman filter. SIAM J. Optim. 6(3), 807–822 (1996)
Article MathSciNet Google Scholar
Bertsekas, D.P.: Nonlinear programming. J. Oper. Res. Soc. 48(3), 334–334 (1997)
Article Google Scholar
Bertsekas, D.P.: Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. CoRR, arXiv:abs/1507.01030 (2015)
Bertsekas, D.P., Tsitsiklis, J.N.: Gradient convergence in gradient methods with errors. SIAM J. Optim. 10(3), 627–642 (2000)
Article MathSciNet Google Scholar
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: COMPSTAT (2010)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Article MathSciNet Google Scholar
Bravi, L., Sciandrone, M.: An incremental decomposition method for unconstrained optimization. Appl. Math. Comput. 235, 80–86 (2014)
MathSciNet MATH Google Scholar
Buzzi, C., Grippo, L., Sciandrone, M.: Convergent decomposition techniques for training RBF neural networks. Neural Comput. 13(8), 1891–1920 (2001)
Article Google Scholar
Chauhan, V.K., Dahiya, K., Sharma, A.: Mini-batch block-coordinate based stochastic average adjusted gradient methods to solve big data problems. In: Proceedings of the Ninth Asian Conference on Machine Learning, volume 77 of Proceedings of Machine Learning Research, pp. 49–64. PMLR, 15–17 Nov 2017
Chollet, F., et al.: Keras (2015)
Dauphin, Y.N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. Adv. Neural Inf. Process. Syst. 27, 2933–2941 (2014)
Google Scholar
Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. Adv. Neural Inf. Process. Syst. 27, 1646–1654 (2014)
Google Scholar
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)
MathSciNet MATH Google Scholar
Fisher, R.A.: Statistical methods for research workers. In: Johnson, N.L., Kotz, S. (eds.) Breakthroughs in Statistics, pp. 66–70. Springer, Berlin (1992)
Chapter Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
MATH Google Scholar
Grippo, L., Manno, A., Sciandrone, M.: Decomposition techniques for multilayer perceptron training. IEEE Trans. Neural Netw. Learn. Syst. 27(11), 2146–2159 (2016)
Article MathSciNet Google Scholar
Grippo, L., Sciandrone, M.: Globally convergent block-coordinate techniques for unconstrained optimization. Optim. Methods Softw. 10(4), 587–637 (1999)
Article MathSciNet Google Scholar
Huang, G., Zhu, Q., Siew, C.: Extreme learning machine: theory and applications. Neurocomputing 70, 489–501 (2006)
Article Google Scholar
Huang, G.-B., Wang, D.H., Lan, Y.: Extreme learning machines: a survey. Int. J. Mach. Learn. Cybern. 2(2), 107–122 (2011)
Article Google Scholar
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Process. Syst. 26, 315–323 (2013)
Google Scholar
Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: open source scientific tools for Python. [Online; Accessed \(<\)today\(>\)] (2001)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR, arXiv:abs/1412.6980 (2014)
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)
Article MathSciNet Google Scholar
Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate o \((1/{\rm k}\hat{}2)\). Dokl. Akad. Nauk SSSR 269, 543–547 (1983)
MathSciNet Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)
MATH Google Scholar
Palagi, L.: Global optimization issues in deep network regression: an overview. J. Glob. Optim. 73, 239–277 (2018)
Article MathSciNet Google Scholar
Qin, T., Scheinberg, K., Goldfarb, D.: Efficient block-coordinate descent algorithms for the group lasso. Math. Program. Comput. 5(6), 143–169 (2013)
Article MathSciNet Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Article MathSciNet Google Scholar
Tieleman, T., Hinton, G.: Lecture 6.5-RMSProp: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 4(2), 26–31 (2012)
Google Scholar
Wang, H., Banerjee, A.: Randomized block coordinate descent for online and stochastic optimization. arXiv preprint arXiv:1407.0107 (2014)
Wright, S.J.: Coordinate descent algorithms. Math. Program. 151(1), 3–34 (2015)
Article MathSciNet Google Scholar
Yu, A.W., Huang, L., Lin, Q., Salakhutdinov, R., Carbonell, J.: Normalized gradient with adaptive stepsize method for deep neural network training. CoRR arXiv:abs/1707.04822 (2017)
Zhao, T., Yu, M., Wang, Y., Arora, R., Liu, H.: Accelerated mini-batch randomized block coordinate descent method. In: Advances in Neural Information Processing Systems, pp. 3329–3337 (2014)

Download references

Author information

Authors and Affiliations

Dip. di Ingegneria informatica automatica e gestionale A. Ruberti, Sapienza Univerity of Rome, Via Ariosto 25, 00185, Rome, Italy
Laura Palagi & Ruggiero Seccia

Authors

Laura Palagi
View author publications
You can also search for this author in PubMed Google Scholar
Ruggiero Seccia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruggiero Seccia.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Laura Palagi was partially supported by the project Distributed optimization algorithms for Big Data of Sapienza No. RM11715C7E49E89C.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Palagi, L., Seccia, R. Block layer decomposition schemes for training deep neural networks. J Glob Optim 77, 97–124 (2020). https://doi.org/10.1007/s10898-019-00856-0

Download citation

Received: 03 December 2018
Accepted: 01 November 2019
Published: 15 November 2019
Issue Date: May 2020
DOI: https://doi.org/10.1007/s10898-019-00856-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Block layer decomposition schemes for training deep neural networks

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Bolstering stochastic gradient descent with model building

Automated machine learning: past, present and future

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Block layer decomposition schemes for training deep neural networks

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Bolstering stochastic gradient descent with model building

Automated machine learning: past, present and future

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation