Training deep neural networks: a static load balancing approach

Moreno-Álvarez, Sergio; Haut, Juan M.; Paoletti, Mercedes E.; Rico-Gallego, Juan A.; Díaz-Martín, Juan C.; Plaza, Javier

doi:10.1007/s11227-020-03200-6

Training deep neural networks: a static load balancing approach

Published: 02 March 2020

Volume 76, pages 9739–9754, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Sergio Moreno-Álvarez ORCID: orcid.org/0000-0002-1858-9920¹,
Juan M. Haut²,
Mercedes E. Paoletti²,
Juan A. Rico-Gallego¹,
Juan C. Díaz-Martín² &
…
Javier Plaza²

627 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

Deep neural networks are currently trained under data-parallel setups on high-performance computing (HPC) platforms, so that a replica of the full model is charged to each computational resource using non-overlapped subsets known as batches. Replicas combine the computed gradients to update their local copies at the end of each batch. However, differences in performance of resources assigned to replicas in current heterogeneous platforms induce waiting times when synchronously combining gradients, leading to an overall performance degradation. Albeit asynchronous communication of gradients has been proposed as an alternative, it suffers from the so-called staleness problem. This is due to the fact that the training in each replica is computed using a stale version of the parameters, which negatively impacts the accuracy of the resulting model. In this work, we study the application of well-known HPC static load balancing techniques to the distributed training of deep models. Our approach is assigning a different batch size to each replica, proportional to its relative computing capacity, hence minimizing the staleness problem. Our experimental results (obtained in the context of a remotely sensed hyperspectral image processing application) show that, while the classification accuracy is kept constant, the training time substantially decreases with respect to unbalanced training. This is illustrated using heterogeneous computing platforms, made up of CPUs and GPUs with different performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bolstering stochastic gradient descent with model building

Article Open access 15 April 2024

Ş. İlker Birbil, Özgür Martin, … Figen Öztoprak

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Article Open access 19 January 2019

Giang Nguyen, Stefan Dlugolinsky, … Ladislav Hluchý

Dropout vs. batch normalization: an empirical study of their impact to deep learning

Article 22 January 2020

Christian Garbin, Xingquan Zhu & Oge Marques

Notes

The source code is available at https://github.com/mhaut/static_load_deeplearning.

References

Beaumont O, Boudet V, Rastello F, Robert Y (2001) Matrix multiplication on heterogeneous platforms. IEEE Trans Parallel Distrib Syst 12(10):1033–1051. https://doi.org/10.1109/71.963416
Article Google Scholar
Ben-Nun T, Hoefler T (2018) Demystifying parallel and distributed deep learning: an in-depth concurrency analysis. arXiv:1802.09941
Chen C, Weng Q, Wang W, Li B, Li B (2018) Fast distributed deep learning via worker-adaptive batch sizing. In: Proceedings of the ACM Symposium on Cloud Computing, SoCC ’18. ACM, New York, USA, pp 521–521
Chen J, Monga R, Bengio S, Jozefowicz R (2016) Revisiting distributed synchronous SGD. In: ICLR Workshop Track
Chiu C, Sainath TN, Wu Y, Prabhavalkar R, Nguyen P, Chen Z, Kannan A, Weiss RJ, Rao K, Gonina K, Jaitly N, Li B, Chorowski J, Bacchiani M (2017) State-of-the-art speech recognition with sequence-to-sequence models. arXiv:1712.01769
Clarke D, Zhong Z, Rychkov V, Lastovetsky A (2013) Fupermod: a framework for optimal data partitioning for parallel scientific applications on dedicated heterogeneous HPC platforms. In: Parallel Computing Technologies. Springer, Berlin, Heidelberg, pp 182–196
Dean J, Corrado GS, Monga R, Chen K, Devin M, Le QV, Mao MZ, Ranzato M, Senior A, Tucker P, Yang K, Ng AY (2012) Large scale distributed deep networks. In: NIPS, USA, pp 1223–1231
Forum MPI (2015) MPI: a message-passing interface standard, version 3.1 , June 4, 2015. High-Performance Computing Center Stuttgart, University of Stuttgart
Fox G, Qiu J, Jha S, Ekanayake S, Kamburugamuve S (2016) Big data, simulations and HPC convergence. In: Big Data Benchmarking. Springer, Cham, pp 3–17
Gupta S, Zhang W, Wang F (2017) Model accuracy and runtime tradeoff in distributed deep learning: a systematic study. In: IJCAI, pp 4854–4858
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv:1512.03385
Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4(2):251–257
Article MathSciNet Google Scholar
Huang Y, Cheng Y, Chen D, Lee H, Ngiam J, Le QV, Chen Z (2018) Gpipe: efficient training of giant neural networks using pipeline parallelism. arXiv:1811.06965
Jain AK, Mao J, Mohiuddin KM (1996) Artificial neural networks: a tutorial. Computer 29(3):31–44
Article Google Scholar
Jiang J, Cui B, Zhang C, Yu L (2017) Heterogeneity-aware distributed parameter servers. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD ’17. ACM, NY, USA, pp 463–478
Krizhevsky A (2014) One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25. Curran Associates, Inc., pp 1097–1105
Le QV, Ngiam J, Coates A, Lahiri A, Prochnow B, Ng AY (2011) On optimization methods for deep learning. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11. Omnipress, USA, pp 265–272
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436
Article Google Scholar
Paoletti M, Haut J, Plaza J, Plaza A (2019) Deep learning classifiers for hyperspectral imaging: a review. ISPRS J Photogramm Remote Sens 158:279–317
Article Google Scholar
Rico-Gallego JA, Díaz-Martín JC, Calvo-Jurado C, Moreno-Álvarez S, García-Zapata JL (2019) Analytical communication performance models as a metric in the partitioning of data-parallel kernels on heterogeneous platforms. J Supercomput 75(3):1654–1669
Article Google Scholar
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Article Google Scholar
Sergeev A, Balso MD (2018) Horovod: fast and easy distributed deep learning in TensorFlow. arXiv:1802.05799

Download references

Acknowledgements

This work was jointly supported by the following projects and institutions: (1) The European Regional Development Fund ‘A way to achieve Europe’ (ERDF) and the Extremadura Local Government (Ref. IB16118). (2) The Ministry of Education, November 19, 2015, of the Secretary of State for Education, Vocational Training and Universities, under grant FPU15/02090. (3) The computing facilities of Extremadura Research Center for Advanced Technologies (CETA-CIEMAT), funded by the European Regional Development Fund (ERDF).

Author information

Authors and Affiliations

Department of Computer Systems Engineering and Telematics, University of Extremadura, Cáceres, Spain
Sergio Moreno-Álvarez & Juan A. Rico-Gallego
Department of Technology of Computers and Communications, University of Extremadura, Cáceres, Spain
Juan M. Haut, Mercedes E. Paoletti, Juan C. Díaz-Martín & Javier Plaza

Authors

Sergio Moreno-Álvarez
View author publications
You can also search for this author in PubMed Google Scholar
Juan M. Haut
View author publications
You can also search for this author in PubMed Google Scholar
Mercedes E. Paoletti
View author publications
You can also search for this author in PubMed Google Scholar
Juan A. Rico-Gallego
View author publications
You can also search for this author in PubMed Google Scholar
Juan C. Díaz-Martín
View author publications
You can also search for this author in PubMed Google Scholar
Javier Plaza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergio Moreno-Álvarez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moreno-Álvarez, S., Haut, J.M., Paoletti, M.E. et al. Training deep neural networks: a static load balancing approach. J Supercomput 76, 9739–9754 (2020). https://doi.org/10.1007/s11227-020-03200-6

Download citation

Published: 02 March 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11227-020-03200-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Training deep neural networks: a static load balancing approach

Abstract

Access this article

Similar content being viewed by others

Bolstering stochastic gradient descent with model building

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Dropout vs. batch normalization: an empirical study of their impact to deep learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Training deep neural networks: a static load balancing approach

Abstract

Access this article

Similar content being viewed by others

Bolstering stochastic gradient descent with model building

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Dropout vs. batch normalization: an empirical study of their impact to deep learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation