Effective Scheduler for Distributed DNN Training Based on MapReduce and GPU Cluster

Xu, Jie; Wang, Jingyu; Qi, Qi; Sun, Haifeng; Liao, Jianxin; Yang, Di

doi:10.1007/s10723-021-09550-6

Effective Scheduler for Distributed DNN Training Based on MapReduce and GPU Cluster

Published: 22 February 2021

Volume 19, article number 8, (2021)
Cite this article

Journal of Grid Computing Aims and scope Submit manuscript

Jie Xu ORCID: orcid.org/0000-0003-0794-4074¹,
Jingyu Wang¹,
Qi Qi¹,
Haifeng Sun¹,
Jianxin Liao¹ &
…
Di Yang²

151 Accesses
3 Citations
Explore all metrics

Abstract

Parallel training accelerates the Deep Neural Networks (DNN) training by parallel GPUs. While the in-memory data transmission becomes the cross-node network transmission due to distribution of GPUs on different nodes, which drags the training time. Most researches address it by reducing the data size on network links. However, the factor of network distance is ignored. In this paper, we construct a distributed DNN training architecture based on MapReduce. The customized scheduler is designed to make the computations nodes that finish the training closer to the nodes that storage data. At the same time, the parallel training models are synchronized by adjusting the data transmission time. The experimental results show that the shortened network distance benefits the reduced network traffic usage. The resulting data transmission time decreases the training time by at least 50% and guarantees the synchronization for the parallel training.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating Training for Distributed Deep Neural Networks in MapReduce

Theano-MPI: A Theano-Based Distributed Training Framework

Comparative Study of Distributed Deep Learning Tools on Supercomputers

References

Tejas, S. B., Lina, J.K.: DeepCorrect: Correcting DNN Models Against Image Distortions. IEEE Trans. Image Process. 28(12), 6022–6034 (2019). https://doi.org/10.1109/TIP.2019.2924172
Article MathSciNet MATH Google Scholar
Heo, H., Jung, J., Yang, I., Yoon, S., Yu, H.: Joint training of expanded end-to-end DNN for text-dependent speaker verification. In: The 18th Annual Conference of the International Speech Communication Association (Interspeech). https://doi.org/10.21437/Interspeech.2017-1050, pp 1532–1536. ISCA (2017)
José, N., Josué, F., Víctor, P., Néstor, B.Y.: Uncertainty weighting and propagation in DNN-HMM-based speech recognition. Comput. Speech Lang. 47, 30–46 (2018). https://doi.org/10.1016/j.csl.2017.06.005
Article Google Scholar
Jinkun, G., Dan, L., Shuai, W.: ElasticPipe: An efficient and dynamic model-parallel solution to DNN training. In: The Proceedings of the 10th Workshop on Scientific Cloud Computing (ScienceCloud@HPDC), pp. 5–9. ACM. https://doi.org/10.1145/3322795.3331463 (2019)
I-Hsin, C., Tara, N. S., Bhuvana, R., Michael, P., John, A. G., Vernon, A., Upendra, V. C., Brian, K.: Parallel deep neural network training for big data on blue Gene/Q. IEEE Trans. Parallel Distrib. Syst. 28(6), 1703–1714 (2017). https://doi.org/10.1109/TPDS.2016.2626289
Article Google Scholar
Disha, S., Santanu, C.: Jayadeva: A data and model-parallel, distributed and scalable framework for training of deep networks in apache spark. arXiv:1708.05840 (2017)
Jeffrey, D., Greg, C., Rajat, M., Kai, C., Matthieu, D., Quoc, V. L., Mark, Z. M., Marc’Aurelio, R., Andrew, W. S., Paul, A. T., Ke, Y., Andrew, Y. N.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems 25: the 26th Annual Conference on Neural Information Processing Systems, pp. 1232–1240 (2012)
Trishul, M. C., Yutaka, S., Johnson, A., Karthik, K.: Project Adam: Building an efficient and scalable deep learning training system. In: The 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 571–582. USENIX Association (2014)
Alexander, S., Mike, D. B.: Horovod: fast and easy distributed deep learning in TensorFlow. arXiv:1802.05799 (2018)
Zhihao, J., Matei, Z., Alex, A.: Beyond data and model parallelism for deep neural networks, arXiv:1807.05358 (2018)
Zhenheng, T., Shaohuai, S., Xiaowen, C., Wei, W., Bo, L.: Communication-efficient distributed deep learning: A comprehensive survey, arXiv:2003.06307 (2020)
Guangli, L., Lei, L., Xueying, W., Xiu, M., Xiaobing, F.: LANCE: Efficient low-precision quantized Winograd convolution for neural networks based on graphics processing units, arXiv:2003.08646 (2020)
Alham, F., Kenneth, H.: Sparse communication for distributed gradient descent. In: The Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing(EMNLP), pp.440–445. Association for Computational Linguistics. https://doi.org/10.18653/v1/d17-1045 (2017)
Raghuraman, K.: Quantizing deep convolutional networks for efficient inference: A whitepaper, arXiv:1806.08342 (2018)
Xianyan, J., Shutao, S., Wei, H., Yangzihao, W., Haidong, R., Feihu, Z., Liqiang, X., Zhenyu, G., Yuanzhou, Y., Liwei, Y., Tiegang, C., Guangxiao, H., Shaohuai, S., Xiaowen, C.: Highly scalable deep learning training system with mixed-precision: Training ImageNet in Four Minutes, arXiv:1807.11205 (2018)
Hao, Z., Zeyu, Z., Shizhen, X., Wei, D., Qirong, H., Xiaodan, L., Zhiting, H., Jinliang, W., Pengtao, X., Eric, P. X.: Poseidon: An efficient communication architecture for distributed deep learning on GPU clusters. In: The USENIX Annual Technical Conference (USENIX ATC), pp. 181–193. USENIX Association (2017)
Youjie, L., Mingchao, Y., Songze, L., Salman, A., Nam, S. K., Alexander, G. S.: Pipe-SGD: A decentralized pipelined SGD framework for distributed deep net training. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems(NeurIPS), pp. 8056–8067 (2018)
Xiangru, L., Wei, Z., Ce, Z., Ji, L.: Asynchronous decentralized parallel stochastic gradient descent. In: The Proceedings of the 35th International Conference on Machine Learning(ICML), pp.3049–3058. PMLR (2018)
Jianmin, C., Rajat, M., Samy, B., Rafal, J.: Revisiting distributed synchronous SGD, arXiv:1604.00981 (2016)
Junxiong, W., Hongzhi, W., Chenxu, Z., Jianzhong, L., Hong, G.: Iteration acceleration for distributed learning systems. Parallel Comput. 72, 29–41 (2018). https://doi.org/10.1016/j.parco.2018.01.001
Article MathSciNet Google Scholar
Xiangrui, M., Joseph, K. B., Burak, Y., Evan, R. S., Shivaram, V., Davies, L., Jeremy, F., D.B, T., Manish, A., Sean, O., Doris, X., Reynold, X., Michael, J.F., Reza, Z., Matei, Z., Ameet, T.: MLlib: Machine Learning in Apache Spark. J. Mach. Learn. Res. 17, 34:1–34:7 (2016)
MathSciNet MATH Google Scholar
Frank, S., Hao, F., Jasha, D., Gang, L., Dong, Y.: 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs. In: The 5th Annual Conference of the International Speech Communication Association(INTERSPEECH), pp. 1058–1062. ISCA (2014)
Dan, A., Demjan, G., Jerry, L., Ryota, T., Milan, V.: QSGD: Communication-efficient SGD via gradient quantization and encoding. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems(NIPS), pp. 1707–1718 (2017)
Paulius, M., Sharan, N., Jonah, A., Gregory, F. D., Erich, E., David, G., Boris, G., Michael, H., Oleksii, K., Ganesh, V., Hao, W.: Mixed precision training. In: The 6th International Conference on Learning Representations(ICLR), Conference Track Proceedings. OpenReview.net (2018)
Chia-Yu, C., Jungwook, C., Daniel, B., Ankur, A., Wei, Z., Kailash, G.: AdaComp: Adaptive residual gradient compression for data-parallel distributed training. In: The Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence(AAAI), the 30th innovative Applications of Artificial Intelligence(IAAI), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI), pp. 2827–2835. AAAI Press (2018)
Yujun, L., Song, H., Huizi, M., Yu, W., Bill, D.: Deep gradient compression: Reducing the communication bandwidth for distributed training. In: The 6th International Conference on Learning Representations(ICLR), Conference Track Proceedings. OpenReview.net (2018)
Hizhao, S., Wei, C., Jiang, B., Xiaoguang, L., Tie-Yan, L.: Slim-DP: A multi-agent system for communication-efficient distributed deep learning. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems(AAMAS), pp 721–729. International Foundation for Autonomous Agents and Multiagent Systems Richland, SC, USA/ACM (2018)
Linnan, W., Wei, W., Yiyang, Z., Junyu, Z., Hang, L., George, B., Jack, J. D., Maurice, H., Rodrigo, F.: SuperNeurons: FFT-based Gradient Sparsification in the Distributed Training of Deep Neural Networks, arXiv:1811.08596 (2018)
Youjie, L., Jongse, P., Mohammad, A., Yifan, Y., Zheng, Q., Peitian, P., Ren, W., Alexander, G. S., Hadi, E., Nam, S. K.: A network-centric hardware/algorithm co-design to accelerate distributed training of deep neural networks. In: 51st Annual IEEE/ACM International Symposium on Microarchitecture(MICRO), pp. 175–188. IEEE Computer Society (2018), https://doi.org/10.1109/MICRO.2018.00023
Eric, P. X., Qirong, H., Wei, D., Jin, K. K., Jinliang, W., Seunghak, L., Xun, Z., Pengtao, X., Abhimanu, K., Yaoliang, Y.: Petuum: A new platform for distributed machine learning on big data. IEEE Trans. Big Data. 1(2), 49–67 (2015)
Article Google Scholar
Henggang, C., Hao, Z., Gregory, R. G., Phillip, B. G., Eric, P. X.: GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server. In: Proceedings of the Eleventh European Conference on Computer Systems(EuroSys), pp. 4:1–4:16. ACM. https://doi.org/10.1145/2901318.2901323(2016)
Matei, Z., Dhruba, B., Joydeep, S. S., Khaled, E., Scott, S., Ion, S.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: The Proceedings of the 5th European conference on Computer systems (EuroSys), pp. 265–278. ACM. https://doi.org/10.1145/1755913.1755940(2010)
Xiaohong, Z., Zhiyong, Z., Shengzhong, F., Bibo, T., Jianping, F.: Improving data locality of MapReduce by scheduling in homogeneous computing environments. In: IEEE International Symposium on Parallel and Distributed Processing with Applications(ISPA), pp. 120–126. IEEE Computer Society (2011), https://doi.org/10.1109/ISPA.2011.14
Fengjun, S., Xuanling, C., Chenyun, Y.: A strategy for scheduling reduce task based on intermediate data locality of the MapReduce. Clust. Comput. 20(4), 2821–2831 (2017). https://doi.org/10.1007/s10586-017-0972-7
Article Google Scholar
Carlos, G., Isaac, L., Carlos, J.: Migration-aware genetic optimization for MapReduce scheduling and replica placement in Hadoop. J. Grid Comput. 16(2), 265–284 (2018). https://doi.org/10.1007/s10723-018-9432-8
Article Google Scholar
Haiying, S., Ankur, S., Lei, Y., Feng, D.: Probabilistic network-aware task placement for MapReduce scheduling. In: IEEE International Conference on Cluster Computing(CLUSTER), pp. 241–250. IEEE Computer Society. https://doi.org/10.1109/CLUSTER.2016.48 (2016)
Shuai, Z., Deep, M.: Application-aware network design for Hadoop MapReduce optimization using software-defined networking. IEEE Trans. Network and Service Management. 14(4), 804–816 (2017). https://doi.org/10.1109/TNSM.2017.2728519
Article Google Scholar
Zhao, L., Yao, S., Bin, Y., Minyi, G.: OFScheduler: A dynamic network optimizer for MapReduce in heterogeneous cluster. Int. J. Parallel Program. 43(3), 472–488 (2015). https://doi.org/10.1007s10766-013-0281-6
Article Google Scholar
Bartlomiej, S., Piotr, N., Michal, W., Marcin, J., Krzysztof, Z.: VM Reservation Plan Adaptation Using Machine Learning in Cloud Computing. J. Grid Comput. 17(4), 797–812 (2019). https://doi.org/10.1007/s10723-019-09487-x
Article Google Scholar
Danilo, O., Andrė, B., Nelson, R., Paulo, R.: Performability evaluation and optimization of workflow applications in cloud environments. J. Grid Comput. 17(4), 749–770 (2019). https://doi.org/10.1007/s10723-019-09476-0
Article Google Scholar
Li, C., Tang, J., Youlong, L.: Hybrid cloud adaptive scheduling strategy for heterogeneous workloads. J. Grid Comput. 17(3), 419–446 (2019). https://doi.org/10.1007/s10723-019-09481-3
Article Google Scholar
Facebook: FairScheduler. Available via: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html. Cited 19, Nov 2018 (2018)
Michael, I., Vijayan, P., Jon, C., Udi, W., Kunal, T., Andrew, V. G.: Quincy: fair scheduling for distributed computing clusters. In: The Proceedings of the 22nd ACM Symposium on Operating Systems Principles(SOSP), pp. 261–276. ACM. https://doi.org/10.1145/1629575.1629601 (2009)
Chien-Hung, C., Jenn-Wei, L., Sy-Yen, K.: Deadline-constrained MapReduce scheduling based on graph modelling. In: The IEEE 7th International Conference on Cloud Computing, pp. 416–423. IEEE Computer Society. https://doi.org/10.1109/CLOUD.2014.63 (2014)
Xin, L. D., Theodoros, R.: Data integration and machine learning: a natural synergy. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD), pp. 3193–3194. ACM. https://doi.org/10.1145/3292500.3332296 (2019)
Changchang, L., Wei-Han, L., Seraphin, B. C.: Neuraltran: Optimal data transformation for privacy-preserving machine learning by leveraging neural networks. In: 50th Annual IEEE-IFIP International Conference on Dependable Systems and Networks(DSN), pp. 21–24. IEEE. https://doi.org/10.1109/DSN-S50200.2020.00018 (2020)
Weibang, L., Ling, L., Zhanhuai, L., Mengtian, C.: Statistical relational learning based automatic data cleaning. Frontiers Comput. Sci. 13(1), 215–217 (2019). https://doi.org/10.1007/s11704-018-7066-4
Article Google Scholar
Saugato, R. D., Raziur, R., Kevin, M., Souparno, G., Ranadip, P.: Dimensionality reduction based transfer learning applied to pharmacogenomicsdatabases. In: 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society(EMBC), pp. 1246–1249. IEEE. https://doi.org/10.1109/EMBC.2018.8512457 (2018)
Ravindra, K. A., Thomas, L. M., James, B. O.: Network Flows-Theory, Algorithms and Applications. Prentice Hall, Upper Saddle River (1993)
MATH Google Scholar
Ionel, G., Malte, S., Adam, G., Robert, N. M. W., Steven, H.: Firmament: Fast, centralized cluster scheduling at scale. In: 12th USENIX Symposium on Operating Systems Design and Implementation(OSDI), pp. 99–115. USENIX Association (2016)
Alex, K., Ilya, S., Geoffrey, E.H.: ImageNet classification with deep convolutional neural networks. Commun. ACM. 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
Article Google Scholar
Yehya, A., Ola, S. A., Hager, R., Mohamed, M.: CIFAR-10: KNN-based ensemble of classifiers, arXiv:1611.04905 (2016)
Adonis, E.T.: Face shape classification using Inception v3, arXiv:1911.07916(2019)
Hesen, C., Jingyu, W., Qi, Q., Yujian, L., Haifeng, S.: Bilinear CNN models for food recognition. In: International Conference on Digital Image Computing: Techniques and Applications(DICTA), pp. 1–6. IEEE. https://doi.org/10.1109/DICTA.2017.8227411 (2017)
Joonseok, L., Apostol, N., Walter, R., Rahul, S., George, T.: The 2nd YouTube-8M large-scale video understanding challenge. In: Computer Vision - ECCV 2018 Workshops, Proceedings, Part IV, pp. 193–205. Springer, Munich, Germany. https://doi.org/10.1007/978-3-030-11018-5_18(2018)
Canan, B. S., Banu, D.: Robust feature selection with LSTM recurrent neural networks for artificial immune recognition system. IEEE Access. 7, 24165–24178 (2019). https://doi.org/10.1109/ACCESS.2019.2900118
Article Google Scholar

Download references

Acknowledgments

This work was jointly funded by National Natural Science Foundation of China (No. 61671079, No. 61771068), Beijing Municipal Natural Science Foundation (No. 418-2041).

Author information

Authors and Affiliations

State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Jie Xu, Jingyu Wang, Qi Qi, Haifeng Sun & Jianxin Liao
China Unicom Software Research Institute, Beijing, China
Di Yang

Authors

Jie Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jingyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qi Qi
View author publications
You can also search for this author in PubMed Google Scholar
Haifeng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jianxin Liao
View author publications
You can also search for this author in PubMed Google Scholar
Di Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Xu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, J., Wang, J., Qi, Q. et al. Effective Scheduler for Distributed DNN Training Based on MapReduce and GPU Cluster. J Grid Computing 19, 8 (2021). https://doi.org/10.1007/s10723-021-09550-6

Download citation

Received: 06 April 2020
Accepted: 21 January 2021
Published: 22 February 2021
DOI: https://doi.org/10.1007/s10723-021-09550-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective Scheduler for Distributed DNN Training Based on MapReduce and GPU Cluster

Abstract

Access this article

Similar content being viewed by others

Accelerating Training for Distributed Deep Neural Networks in MapReduce

Theano-MPI: A Theano-Based Distributed Training Framework

Comparative Study of Distributed Deep Learning Tools on Supercomputers

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effective Scheduler for Distributed DNN Training Based on MapReduce and GPU Cluster

Abstract

Access this article

Similar content being viewed by others

Accelerating Training for Distributed Deep Neural Networks in MapReduce

Theano-MPI: A Theano-Based Distributed Training Framework

Comparative Study of Distributed Deep Learning Tools on Supercomputers

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation