RETRACTED ARTICLE: Detecting straggler MapReduce tasks in big data processing infrastructure by neural network

Javadpour, Amir; Wang, Guojun; Rezaei, Samira; Li, Kuan-Ching

doi:10.1007/s11227-019-03136-6

RETRACTED ARTICLE: Detecting straggler MapReduce tasks in big data processing infrastructure by neural network

Published: 09 January 2020

Volume 76, pages 6969–6993, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Amir Javadpour¹,
Guojun Wang¹,
Samira Rezaei² &
…
Kuan-Ching Li³

762 Accesses
23 Citations
Explore all metrics

This article was retracted on 08 April 2024

This article has been updated

Abstract

Straggler task detection is one of the main challenges in applying MapReduce for parallelizing and distributing large-scale data processing. It is defined as detecting running tasks on weak nodes. Considering two stages in the Map phase (copy, combine) and three stages of Reduce (shuffle, sort and reduce), the total execution time is the total sum of the execution time of these five stages. Estimating the correct execution time in each stage that results in correct total execution time is the primary purpose of this paper. The proposed method is based on the application of a backpropagation neural network on the Hadoop for the detection of straggler tasks, to estimate the remaining execution time of tasks that is very important in straggler task detection. Results achieved have been compared with popular algorithms in this domain such as LATE, ESAMR and the real remaining time for WordCount and Sort benchmarks, and shown able to detect straggler tasks and estimate execution time accurately. Besides, it supports to accelerate task execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recognizing MapReduce Straggler Tasks in Big Data Infrastructures Using Artificial Neural Networks

Article 30 March 2020

Early straggler tasks detection by recurrent neural network in a heterogeneous environment

Article 22 July 2022

Reducing Stage Weight Estimation Error of Slow Task Detection in MapReduce Scheduling

Change history

08 April 2024
This article has been retracted. Please see the Retraction Notice for more detail: https://doi.org/10.1007/s11227-024-06126-5

References

Nanduri R, Maheshwari N, Reddyraja A, Varma V (2011) Job aware scheduling algorithm for mapreduce framework. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp 724–729
Song G, Meng Z, Huet F, Magoules F, Yu L, Lin X (2013) A hadoop mapreduce performance prediction method. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications 2013 IEEE International Conference on Embedded and Ubiquitous Computing, pp 820–825
Javadpour A (2019) An optimize-aware target tracking method combining MAC layer and active nodes in wireless sensor networks. Wirel Pers Commun 1–18
Javadpour A (2019) Providing a way to create balance between reliability and delays in SDN networks by using the appropriate placement of controllers. Wirel Pers Commun
Javadpour A, Kazemi Abharian S, Wang G (2017) Feature selection and intrusion detection in cloud environment based on machine learning algorithms. In: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications 2017 IEEE International Conference on Ubiquitous Computing and Communications, pp 1417–1421
Kaur N, Sood SK (2017) An energy-efficient architecture for the Internet of Things (IoT). IEEE Syst J 11(2):796–805
Article Google Scholar
Javadpour A, Mohammadi A (2016) Improving brain magnetic resonance image (MRI) segmentation via a novel algorithm based on genetic and regional growth. J Biomed Phys Eng 6(2):95–108
Google Scholar
Javadpour A, Memarzadeh-Tehran H (2015) A wearable medical sensor for provisional healthcare. In: ISPTS 2015—2nd International Symposium on Physics and Technology of Sensors: Dive Deep Into Sensors, Proceedings, pp 293–296
Javadpour A, Memarzadeh-Tehran H, Saghafi F (2015) A temperature monitoring system incorporating an array of precision wireless thermometers. In: 2015 International Conference on Smart Sensors and Application (ICSSA), pp 155–160
Rezaei S, Radmanesh H, Alavizadeh P, Nikoofar H, Lahouti F (2016) Automatic fault detection and diagnosis in cellular networks using operations support systems data. In: NOMS 2016—2016 IEEE/IFIP Network Operations and Management Symposium, pp 468–473
Park JJ, Adeli H, Park N, Woungang I (2013) Mobile, ubiquitous, and intelligent computing: MUSIC 2013. Springer, Berlin
Google Scholar
Zhang Z et al (2015) Scientific computing meets big data technology: an astronomy use case. In: 2015 IEEE International Conference on Big Data (Big Data), pp 918–927
Aggarwal VB, Bhatnagar V, Mishra DK (2017) Big Data Analytics: Proceedings of CSI 2015. Springer, Singapore
Javadpour A, Wang G, Rezaei S, Chend S (2018) Power curtailment in cloud environment utilising load balancing machine allocation. In: 2018 IEEE SmartWorld, Ubiquitous Intelligence Computing, Advanced Trusted Computing, Scalable Computing Communications, Cloud Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp 1364–1370
Javadpour A, Wang G, Li K-C (2019) A high throughput MAC protocol for wireless body area networks in intensive care. In: Smart City and Informatization, pp 23–34
Liu Q, Wang G, Liu X, Peng T, Wu J (2017) Achieving reliable and secure services in cloud computing environments. Comput Electr Eng 59:153–164
Article Google Scholar
Javadpour A, Saedifar K, Wang G, Li K-C (2020) Optimal execution strategy for large orders in big data: order type using Q-learning considerations. Wirel Pers Commun. https://doi.org/10.1007/s11277-019-07019-0
Javadpour A (2019) Improving resources management in network virtualization by utilizing a software-based network. Wirel Pers Commun 106(2):505–519
Article Google Scholar
Wang T, Li Y, Wang G, Cao J, Bhuiyan MZA, Jia W (2017) Sustainable and efficient data collection from WSNs to cloud. IEEE Trans Sustain Comput PP(99):1
Google Scholar
Javadpour A, Adelpour N, Wang G, Peng T (2018) Combing fuzzy clustering and PSO algorithms to optimize energy consumption in WSN networks. In: 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation, pp 1371–1377
Yinglong Dai K-CL, Wang G (2018) Conceptual alignment deep neural networks. Intell Fuzzy Syst 34:1631–1642
Phan T-D, Pallez G, Ibrahim S, Raghavan P (2019) A new framework for evaluating straggler detection mechanisms in mapreduce. ACM Trans Model Perform Eval Comput Syst 4(3):14:1–14:23
Article Google Scholar
Yao Y, Gao H, Wang J, Sheng B, Mi N (2019) New scheduling algorithms for improving performance and resource utilization in hadoop YARN clusters. IEEE Trans Cloud Comput 1
Polato I, Ré R, Goldman A, Kon F (2014) A comprehensive view of hadoop research—a systematic literature review”. J Netw Comput Appl 46(C):1–25
Article Google Scholar
Ouyang X, Garraghan P, Primas B, McKee D, Townend P, Xu J (2018) Adaptive speculation for efficient internetware application execution in clouds. ACM Trans Internet Technol 18:15:1–15:22
Article Google Scholar
Wang N, Yang J, Lu Z, Li X, Wu J (2016) Comparison and improvement of hadoop mapreduce performance prediction models in the private cloud. In: Wang G, Han Y, Martínez Pérez G (eds) Advances in Services Computing: 10th Asia-Pacific Services Computing Conference, APSCC 2016, Zhangjiajie, China, November 16–18, Proceedings. Springer International Publishing, Cham, pp 77–91
Pol VV, Patil SM (2016) Implementation of on-process aggregation for efficient big data processing in Hadoop MapReduce environment. In: 2016 International Conference on Inventive Computation Technologies (ICICT), vol 3, pp 1–5
Sakr S, Liu A, Fayoumi AG (2013) The family of mapreduce and large-scale data processing systems. ACM Comput Surv 46(1):11:1–11:44
Article Google Scholar
Zhang X, Wu Y, Zhao C (2016) MrHeter: improving MapReduce performance in heterogeneous environments. Clust Comput 19(4):1691–1701
Article Google Scholar
Sun X, He C, Lu Y (2012) ESAMR: an enhanced self-adaptive mapreduce scheduling algorithm. In: 2012 IEEE 18th International Conference on Parallel and Distributed Systems, pp 148–155
Giachetta R (2015) Computers & graphics special section on processing large geospatial data a framework for processing large scale geospatial and remote sensing data in mapreduce environment. Comput Graph 49:37–46
Article Google Scholar
Chen Q, Liu C, Xiao Z (2014) Improving mapreduce performance using smart speculative execution strategy. IEEE Trans Comput 63(4):954–967
Article MathSciNet Google Scholar
Sun M, Zhuang H, Li C, Lu K, Zhou X (2016) Scheduling algorithm based on prefetching in mapreduce clusters. Appl Soft Comput 38(C):1109–1118
Article Google Scholar
Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp 1–10
Fan L, Gao B, Zhang F, Liu Z (2014) OS4M: achieving global load balance of mapreduce workload by scheduling at the operation level. CoRR, vol. abs/1406.3
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892
Article Google Scholar
Peng Z, Wang G (2017) An optimal energy-saving real-time task-scheduling algorithm for mobile terminals. Int J Distrib Sens Netw 13(5):1550147717707891
Article Google Scholar
Zaharia M, Konwinski A, Joseph AD, Katz R, Stoica I (2008) Improving mapreduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, pp 29–42
Shen H, Li C (2018) Zeno: a straggler diagnosis system for distributed computing using machine learning. In: High Performance Computing, pp 144–162
Satapathy SC, Raju KS, Mandal JK, Bhateja V (2015) Proceedings of the Second International Conference on Computer and Communication Technologies: IC3T 2015, vol 2. Springer India
Chen Q, Zhang D, Guo M, Deng Q, Guo S (2010) SAMR: a self-adaptive mapreduce scheduling algorithm in heterogeneous environment. In” 2010 10th IEEE International Conference on Computer and Information Technology, pp 2736–2743
Yang G (2011) The application of mapreduce in the cloud computing. In: 2011 2nd International Symposium on Intelligence Information Processing and Trusted Computing, pp 154–156
Li Y, Yang Q, Lai S, Li B (2015) A new speculative execution algorithm based on C4.5 decision tree for hadoop. In: Wang H, Qi H, Che W, Qiu Z, Kong L, Han Z, Lin J, Lu Z (eds) Intelligent Computation in Big Data Era: International Conference of Young Computer Scientists, Engineers and Educators, ICYCSEE 2015, Harbin, China, January 10–12, 2015. Proceedings. Springer, Berlin, pp 284–291

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China under Grants 61632009 & 61472451, in part by the Guangdong Provincial Natural Science Foundation under Grant 2017A030308006 and High-Level Talents Program of Higher Education in Guangdong Province under Grant 2016ZJ01.

Author information

Authors and Affiliations

School of Computer Science, Guangzhou University, Guangzhou, 510006, China
Amir Javadpour & Guojun Wang
Bernoulli Institute for Mathematics and Computer Science, University of Groningen, Groningen, The Netherlands
Samira Rezaei
Department of Computer Science and Information Engineering, Providence University, Taichung, Taiwan
Kuan-Ching Li

Authors

Amir Javadpour
View author publications
You can also search for this author in PubMed Google Scholar
Guojun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Samira Rezaei
View author publications
You can also search for this author in PubMed Google Scholar
Kuan-Ching Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guojun Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article has been retracted. Please see the retraction notice for more detail:https://doi.org/10.1007/s11227-024-06126-5

Appendix

Algorithm A: Calculate progress score or weight for each task based on current phase of each task.

Algorithm B: Calculate progress score for tasks in map phase.

Algorithm C: Compute progress score for tasks in reduction phase.

See Figs. 13, 14 and 15.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

About this article

Cite this article

Javadpour, A., Wang, G., Rezaei, S. et al. RETRACTED ARTICLE: Detecting straggler MapReduce tasks in big data processing infrastructure by neural network. J Supercomput 76, 6969–6993 (2020). https://doi.org/10.1007/s11227-019-03136-6

Download citation

Published: 09 January 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11227-019-03136-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RETRACTED ARTICLE: Detecting straggler MapReduce tasks in big data processing infrastructure by neural network

Abstract

Access this article

Similar content being viewed by others

Recognizing MapReduce Straggler Tasks in Big Data Infrastructures Using Artificial Neural Networks

Early straggler tasks detection by recurrent neural network in a heterogeneous environment

Reducing Stage Weight Estimation Error of Slow Task Detection in MapReduce Scheduling

Change history

08 April 2024

References

Acknowledgements