Skip to main content
Log in

Effective Scheduler for Distributed DNN Training Based on MapReduce and GPU Cluster

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Parallel training accelerates the Deep Neural Networks (DNN) training by parallel GPUs. While the in-memory data transmission becomes the cross-node network transmission due to distribution of GPUs on different nodes, which drags the training time. Most researches address it by reducing the data size on network links. However, the factor of network distance is ignored. In this paper, we construct a distributed DNN training architecture based on MapReduce. The customized scheduler is designed to make the computations nodes that finish the training closer to the nodes that storage data. At the same time, the parallel training models are synchronized by adjusting the data transmission time. The experimental results show that the shortened network distance benefits the reduced network traffic usage. The resulting data transmission time decreases the training time by at least 50% and guarantees the synchronization for the parallel training.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Tejas, S. B., Lina, J.K.: DeepCorrect: Correcting DNN Models Against Image Distortions. IEEE Trans. Image Process. 28(12), 6022–6034 (2019). https://doi.org/10.1109/TIP.2019.2924172

    Article  MathSciNet  MATH  Google Scholar 

  2. Heo, H., Jung, J., Yang, I., Yoon, S., Yu, H.: Joint training of expanded end-to-end DNN for text-dependent speaker verification. In: The 18th Annual Conference of the International Speech Communication Association (Interspeech). https://doi.org/10.21437/Interspeech.2017-1050, pp 1532–1536. ISCA (2017)

  3. José, N., Josué, F., Víctor, P., Néstor, B.Y.: Uncertainty weighting and propagation in DNN-HMM-based speech recognition. Comput. Speech Lang. 47, 30–46 (2018). https://doi.org/10.1016/j.csl.2017.06.005

    Article  Google Scholar 

  4. Jinkun, G., Dan, L., Shuai, W.: ElasticPipe: An efficient and dynamic model-parallel solution to DNN training. In: The Proceedings of the 10th Workshop on Scientific Cloud Computing (ScienceCloud@HPDC), pp. 5–9. ACM. https://doi.org/10.1145/3322795.3331463 (2019)

  5. I-Hsin, C., Tara, N. S., Bhuvana, R., Michael, P., John, A. G., Vernon, A., Upendra, V. C., Brian, K.: Parallel deep neural network training for big data on blue Gene/Q. IEEE Trans. Parallel Distrib. Syst. 28(6), 1703–1714 (2017). https://doi.org/10.1109/TPDS.2016.2626289

    Article  Google Scholar 

  6. Disha, S., Santanu, C.: Jayadeva: A data and model-parallel, distributed and scalable framework for training of deep networks in apache spark. arXiv:1708.05840 (2017)

  7. Jeffrey, D., Greg, C., Rajat, M., Kai, C., Matthieu, D., Quoc, V. L., Mark, Z. M., Marc’Aurelio, R., Andrew, W. S., Paul, A. T., Ke, Y., Andrew, Y. N.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems 25: the 26th Annual Conference on Neural Information Processing Systems, pp. 1232–1240 (2012)

  8. Trishul, M. C., Yutaka, S., Johnson, A., Karthik, K.: Project Adam: Building an efficient and scalable deep learning training system. In: The 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 571–582. USENIX Association (2014)

  9. Alexander, S., Mike, D. B.: Horovod: fast and easy distributed deep learning in TensorFlow. arXiv:1802.05799 (2018)

  10. Zhihao, J., Matei, Z., Alex, A.: Beyond data and model parallelism for deep neural networks, arXiv:1807.05358 (2018)

  11. Zhenheng, T., Shaohuai, S., Xiaowen, C., Wei, W., Bo, L.: Communication-efficient distributed deep learning: A comprehensive survey, arXiv:2003.06307 (2020)

  12. Guangli, L., Lei, L., Xueying, W., Xiu, M., Xiaobing, F.: LANCE: Efficient low-precision quantized Winograd convolution for neural networks based on graphics processing units, arXiv:2003.08646 (2020)

  13. Alham, F., Kenneth, H.: Sparse communication for distributed gradient descent. In: The Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing(EMNLP), pp.440–445. Association for Computational Linguistics. https://doi.org/10.18653/v1/d17-1045 (2017)

  14. Raghuraman, K.: Quantizing deep convolutional networks for efficient inference: A whitepaper, arXiv:1806.08342 (2018)

  15. Xianyan, J., Shutao, S., Wei, H., Yangzihao, W., Haidong, R., Feihu, Z., Liqiang, X., Zhenyu, G., Yuanzhou, Y., Liwei, Y., Tiegang, C., Guangxiao, H., Shaohuai, S., Xiaowen, C.: Highly scalable deep learning training system with mixed-precision: Training ImageNet in Four Minutes, arXiv:1807.11205 (2018)

  16. Hao, Z., Zeyu, Z., Shizhen, X., Wei, D., Qirong, H., Xiaodan, L., Zhiting, H., Jinliang, W., Pengtao, X., Eric, P. X.: Poseidon: An efficient communication architecture for distributed deep learning on GPU clusters. In: The USENIX Annual Technical Conference (USENIX ATC), pp. 181–193. USENIX Association (2017)

  17. Youjie, L., Mingchao, Y., Songze, L., Salman, A., Nam, S. K., Alexander, G. S.: Pipe-SGD: A decentralized pipelined SGD framework for distributed deep net training. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems(NeurIPS), pp. 8056–8067 (2018)

  18. Xiangru, L., Wei, Z., Ce, Z., Ji, L.: Asynchronous decentralized parallel stochastic gradient descent. In: The Proceedings of the 35th International Conference on Machine Learning(ICML), pp.3049–3058. PMLR (2018)

  19. Jianmin, C., Rajat, M., Samy, B., Rafal, J.: Revisiting distributed synchronous SGD, arXiv:1604.00981 (2016)

  20. Junxiong, W., Hongzhi, W., Chenxu, Z., Jianzhong, L., Hong, G.: Iteration acceleration for distributed learning systems. Parallel Comput. 72, 29–41 (2018). https://doi.org/10.1016/j.parco.2018.01.001

    Article  MathSciNet  Google Scholar 

  21. Xiangrui, M., Joseph, K. B., Burak, Y., Evan, R. S., Shivaram, V., Davies, L., Jeremy, F., D.B, T., Manish, A., Sean, O., Doris, X., Reynold, X., Michael, J.F., Reza, Z., Matei, Z., Ameet, T.: MLlib: Machine Learning in Apache Spark. J. Mach. Learn. Res. 17, 34:1–34:7 (2016)

    MathSciNet  MATH  Google Scholar 

  22. Frank, S., Hao, F., Jasha, D., Gang, L., Dong, Y.: 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs. In: The 5th Annual Conference of the International Speech Communication Association(INTERSPEECH), pp. 1058–1062. ISCA (2014)

  23. Dan, A., Demjan, G., Jerry, L., Ryota, T., Milan, V.: QSGD: Communication-efficient SGD via gradient quantization and encoding. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems(NIPS), pp. 1707–1718 (2017)

  24. Paulius, M., Sharan, N., Jonah, A., Gregory, F. D., Erich, E., David, G., Boris, G., Michael, H., Oleksii, K., Ganesh, V., Hao, W.: Mixed precision training. In: The 6th International Conference on Learning Representations(ICLR), Conference Track Proceedings. OpenReview.net (2018)

  25. Chia-Yu, C., Jungwook, C., Daniel, B., Ankur, A., Wei, Z., Kailash, G.: AdaComp: Adaptive residual gradient compression for data-parallel distributed training. In: The Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence(AAAI), the 30th innovative Applications of Artificial Intelligence(IAAI), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI), pp. 2827–2835. AAAI Press (2018)

  26. Yujun, L., Song, H., Huizi, M., Yu, W., Bill, D.: Deep gradient compression: Reducing the communication bandwidth for distributed training. In: The 6th International Conference on Learning Representations(ICLR), Conference Track Proceedings. OpenReview.net (2018)

  27. Hizhao, S., Wei, C., Jiang, B., Xiaoguang, L., Tie-Yan, L.: Slim-DP: A multi-agent system for communication-efficient distributed deep learning. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems(AAMAS), pp 721–729. International Foundation for Autonomous Agents and Multiagent Systems Richland, SC, USA/ACM (2018)

  28. Linnan, W., Wei, W., Yiyang, Z., Junyu, Z., Hang, L., George, B., Jack, J. D., Maurice, H., Rodrigo, F.: SuperNeurons: FFT-based Gradient Sparsification in the Distributed Training of Deep Neural Networks, arXiv:1811.08596 (2018)

  29. Youjie, L., Jongse, P., Mohammad, A., Yifan, Y., Zheng, Q., Peitian, P., Ren, W., Alexander, G. S., Hadi, E., Nam, S. K.: A network-centric hardware/algorithm co-design to accelerate distributed training of deep neural networks. In: 51st Annual IEEE/ACM International Symposium on Microarchitecture(MICRO), pp. 175–188. IEEE Computer Society (2018), https://doi.org/10.1109/MICRO.2018.00023

  30. Eric, P. X., Qirong, H., Wei, D., Jin, K. K., Jinliang, W., Seunghak, L., Xun, Z., Pengtao, X., Abhimanu, K., Yaoliang, Y.: Petuum: A new platform for distributed machine learning on big data. IEEE Trans. Big Data. 1(2), 49–67 (2015)

    Article  Google Scholar 

  31. Henggang, C., Hao, Z., Gregory, R. G., Phillip, B. G., Eric, P. X.: GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server. In: Proceedings of the Eleventh European Conference on Computer Systems(EuroSys), pp. 4:1–4:16. ACM. https://doi.org/10.1145/2901318.2901323(2016)

  32. Matei, Z., Dhruba, B., Joydeep, S. S., Khaled, E., Scott, S., Ion, S.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: The Proceedings of the 5th European conference on Computer systems (EuroSys), pp. 265–278. ACM. https://doi.org/10.1145/1755913.1755940(2010)

  33. Xiaohong, Z., Zhiyong, Z., Shengzhong, F., Bibo, T., Jianping, F.: Improving data locality of MapReduce by scheduling in homogeneous computing environments. In: IEEE International Symposium on Parallel and Distributed Processing with Applications(ISPA), pp. 120–126. IEEE Computer Society (2011), https://doi.org/10.1109/ISPA.2011.14

  34. Fengjun, S., Xuanling, C., Chenyun, Y.: A strategy for scheduling reduce task based on intermediate data locality of the MapReduce. Clust. Comput. 20(4), 2821–2831 (2017). https://doi.org/10.1007/s10586-017-0972-7

    Article  Google Scholar 

  35. Carlos, G., Isaac, L., Carlos, J.: Migration-aware genetic optimization for MapReduce scheduling and replica placement in Hadoop. J. Grid Comput. 16(2), 265–284 (2018). https://doi.org/10.1007/s10723-018-9432-8

    Article  Google Scholar 

  36. Haiying, S., Ankur, S., Lei, Y., Feng, D.: Probabilistic network-aware task placement for MapReduce scheduling. In: IEEE International Conference on Cluster Computing(CLUSTER), pp. 241–250. IEEE Computer Society. https://doi.org/10.1109/CLUSTER.2016.48 (2016)

  37. Shuai, Z., Deep, M.: Application-aware network design for Hadoop MapReduce optimization using software-defined networking. IEEE Trans. Network and Service Management. 14(4), 804–816 (2017). https://doi.org/10.1109/TNSM.2017.2728519

    Article  Google Scholar 

  38. Zhao, L., Yao, S., Bin, Y., Minyi, G.: OFScheduler: A dynamic network optimizer for MapReduce in heterogeneous cluster. Int. J. Parallel Program. 43(3), 472–488 (2015). https://doi.org/10.1007s10766-013-0281-6

    Article  Google Scholar 

  39. Bartlomiej, S., Piotr, N., Michal, W., Marcin, J., Krzysztof, Z.: VM Reservation Plan Adaptation Using Machine Learning in Cloud Computing. J. Grid Comput. 17(4), 797–812 (2019). https://doi.org/10.1007/s10723-019-09487-x

    Article  Google Scholar 

  40. Danilo, O., Andrė, B., Nelson, R., Paulo, R.: Performability evaluation and optimization of workflow applications in cloud environments. J. Grid Comput. 17(4), 749–770 (2019). https://doi.org/10.1007/s10723-019-09476-0

    Article  Google Scholar 

  41. Li, C., Tang, J., Youlong, L.: Hybrid cloud adaptive scheduling strategy for heterogeneous workloads. J. Grid Comput. 17(3), 419–446 (2019). https://doi.org/10.1007/s10723-019-09481-3

    Article  Google Scholar 

  42. Facebook: FairScheduler. Available via: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html. Cited 19, Nov 2018 (2018)

  43. Michael, I., Vijayan, P., Jon, C., Udi, W., Kunal, T., Andrew, V. G.: Quincy: fair scheduling for distributed computing clusters. In: The Proceedings of the 22nd ACM Symposium on Operating Systems Principles(SOSP), pp. 261–276. ACM. https://doi.org/10.1145/1629575.1629601 (2009)

  44. Chien-Hung, C., Jenn-Wei, L., Sy-Yen, K.: Deadline-constrained MapReduce scheduling based on graph modelling. In: The IEEE 7th International Conference on Cloud Computing, pp. 416–423. IEEE Computer Society. https://doi.org/10.1109/CLOUD.2014.63 (2014)

  45. Xin, L. D., Theodoros, R.: Data integration and machine learning: a natural synergy. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD), pp. 3193–3194. ACM. https://doi.org/10.1145/3292500.3332296 (2019)

  46. Changchang, L., Wei-Han, L., Seraphin, B. C.: Neuraltran: Optimal data transformation for privacy-preserving machine learning by leveraging neural networks. In: 50th Annual IEEE-IFIP International Conference on Dependable Systems and Networks(DSN), pp. 21–24. IEEE. https://doi.org/10.1109/DSN-S50200.2020.00018 (2020)

  47. Weibang, L., Ling, L., Zhanhuai, L., Mengtian, C.: Statistical relational learning based automatic data cleaning. Frontiers Comput. Sci. 13(1), 215–217 (2019). https://doi.org/10.1007/s11704-018-7066-4

    Article  Google Scholar 

  48. Saugato, R. D., Raziur, R., Kevin, M., Souparno, G., Ranadip, P.: Dimensionality reduction based transfer learning applied to pharmacogenomicsdatabases. In: 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society(EMBC), pp. 1246–1249. IEEE. https://doi.org/10.1109/EMBC.2018.8512457 (2018)

  49. Ravindra, K. A., Thomas, L. M., James, B. O.: Network Flows-Theory, Algorithms and Applications. Prentice Hall, Upper Saddle River (1993)

    MATH  Google Scholar 

  50. Ionel, G., Malte, S., Adam, G., Robert, N. M. W., Steven, H.: Firmament: Fast, centralized cluster scheduling at scale. In: 12th USENIX Symposium on Operating Systems Design and Implementation(OSDI), pp. 99–115. USENIX Association (2016)

  51. Alex, K., Ilya, S., Geoffrey, E.H.: ImageNet classification with deep convolutional neural networks. Commun. ACM. 60(6), 84–90 (2017). https://doi.org/10.1145/3065386

    Article  Google Scholar 

  52. Yehya, A., Ola, S. A., Hager, R., Mohamed, M.: CIFAR-10: KNN-based ensemble of classifiers, arXiv:1611.04905 (2016)

  53. Adonis, E.T.: Face shape classification using Inception v3, arXiv:1911.07916(2019)

  54. Hesen, C., Jingyu, W., Qi, Q., Yujian, L., Haifeng, S.: Bilinear CNN models for food recognition. In: International Conference on Digital Image Computing: Techniques and Applications(DICTA), pp. 1–6. IEEE. https://doi.org/10.1109/DICTA.2017.8227411 (2017)

  55. Joonseok, L., Apostol, N., Walter, R., Rahul, S., George, T.: The 2nd YouTube-8M large-scale video understanding challenge. In: Computer Vision - ECCV 2018 Workshops, Proceedings, Part IV, pp. 193–205. Springer, Munich, Germany. https://doi.org/10.1007/978-3-030-11018-5_18(2018)

  56. Canan, B. S., Banu, D.: Robust feature selection with LSTM recurrent neural networks for artificial immune recognition system. IEEE Access. 7, 24165–24178 (2019). https://doi.org/10.1109/ACCESS.2019.2900118

    Article  Google Scholar 

Download references

Acknowledgments

This work was jointly funded by National Natural Science Foundation of China (No. 61671079, No. 61771068), Beijing Municipal Natural Science Foundation (No. 418-2041).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Xu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, J., Wang, J., Qi, Q. et al. Effective Scheduler for Distributed DNN Training Based on MapReduce and GPU Cluster. J Grid Computing 19, 8 (2021). https://doi.org/10.1007/s10723-021-09550-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10723-021-09550-6

Keywords

Navigation