Abstract
Straggler task detection is one of the main challenges in applying MapReduce for parallelizing and distributing large-scale data processing. It is defined as detecting running tasks on weak nodes. Considering two stages in the Map phase (copy, combine) and three stages of Reduce (shuffle, sort and reduce), the total execution time is the total sum of the execution time of these five stages. Estimating the correct execution time in each stage that results in correct total execution time is the primary purpose of this paper. The proposed method is based on the application of a backpropagation neural network on the Hadoop for the detection of straggler tasks, to estimate the remaining execution time of tasks that is very important in straggler task detection. Results achieved have been compared with popular algorithms in this domain such as LATE, ESAMR and the real remaining time for WordCount and Sort benchmarks, and shown able to detect straggler tasks and estimate execution time accurately. Besides, it supports to accelerate task execution time.
Similar content being viewed by others
Change history
08 April 2024
This article has been retracted. Please see the Retraction Notice for more detail: https://doi.org/10.1007/s11227-024-06126-5
References
Nanduri R, Maheshwari N, Reddyraja A, Varma V (2011) Job aware scheduling algorithm for mapreduce framework. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp 724–729
Song G, Meng Z, Huet F, Magoules F, Yu L, Lin X (2013) A hadoop mapreduce performance prediction method. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications 2013 IEEE International Conference on Embedded and Ubiquitous Computing, pp 820–825
Javadpour A (2019) An optimize-aware target tracking method combining MAC layer and active nodes in wireless sensor networks. Wirel Pers Commun 1–18
Javadpour A (2019) Providing a way to create balance between reliability and delays in SDN networks by using the appropriate placement of controllers. Wirel Pers Commun
Javadpour A, Kazemi Abharian S, Wang G (2017) Feature selection and intrusion detection in cloud environment based on machine learning algorithms. In: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications 2017 IEEE International Conference on Ubiquitous Computing and Communications, pp 1417–1421
Kaur N, Sood SK (2017) An energy-efficient architecture for the Internet of Things (IoT). IEEE Syst J 11(2):796–805
Javadpour A, Mohammadi A (2016) Improving brain magnetic resonance image (MRI) segmentation via a novel algorithm based on genetic and regional growth. J Biomed Phys Eng 6(2):95–108
Javadpour A, Memarzadeh-Tehran H (2015) A wearable medical sensor for provisional healthcare. In: ISPTS 2015—2nd International Symposium on Physics and Technology of Sensors: Dive Deep Into Sensors, Proceedings, pp 293–296
Javadpour A, Memarzadeh-Tehran H, Saghafi F (2015) A temperature monitoring system incorporating an array of precision wireless thermometers. In: 2015 International Conference on Smart Sensors and Application (ICSSA), pp 155–160
Rezaei S, Radmanesh H, Alavizadeh P, Nikoofar H, Lahouti F (2016) Automatic fault detection and diagnosis in cellular networks using operations support systems data. In: NOMS 2016—2016 IEEE/IFIP Network Operations and Management Symposium, pp 468–473
Park JJ, Adeli H, Park N, Woungang I (2013) Mobile, ubiquitous, and intelligent computing: MUSIC 2013. Springer, Berlin
Zhang Z et al (2015) Scientific computing meets big data technology: an astronomy use case. In: 2015 IEEE International Conference on Big Data (Big Data), pp 918–927
Aggarwal VB, Bhatnagar V, Mishra DK (2017) Big Data Analytics: Proceedings of CSI 2015. Springer, Singapore
Javadpour A, Wang G, Rezaei S, Chend S (2018) Power curtailment in cloud environment utilising load balancing machine allocation. In: 2018 IEEE SmartWorld, Ubiquitous Intelligence Computing, Advanced Trusted Computing, Scalable Computing Communications, Cloud Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp 1364–1370
Javadpour A, Wang G, Li K-C (2019) A high throughput MAC protocol for wireless body area networks in intensive care. In: Smart City and Informatization, pp 23–34
Liu Q, Wang G, Liu X, Peng T, Wu J (2017) Achieving reliable and secure services in cloud computing environments. Comput Electr Eng 59:153–164
Javadpour A, Saedifar K, Wang G, Li K-C (2020) Optimal execution strategy for large orders in big data: order type using Q-learning considerations. Wirel Pers Commun. https://doi.org/10.1007/s11277-019-07019-0
Javadpour A (2019) Improving resources management in network virtualization by utilizing a software-based network. Wirel Pers Commun 106(2):505–519
Wang T, Li Y, Wang G, Cao J, Bhuiyan MZA, Jia W (2017) Sustainable and efficient data collection from WSNs to cloud. IEEE Trans Sustain Comput PP(99):1
Javadpour A, Adelpour N, Wang G, Peng T (2018) Combing fuzzy clustering and PSO algorithms to optimize energy consumption in WSN networks. In: 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation, pp 1371–1377
Yinglong Dai K-CL, Wang G (2018) Conceptual alignment deep neural networks. Intell Fuzzy Syst 34:1631–1642
Phan T-D, Pallez G, Ibrahim S, Raghavan P (2019) A new framework for evaluating straggler detection mechanisms in mapreduce. ACM Trans Model Perform Eval Comput Syst 4(3):14:1–14:23
Yao Y, Gao H, Wang J, Sheng B, Mi N (2019) New scheduling algorithms for improving performance and resource utilization in hadoop YARN clusters. IEEE Trans Cloud Comput 1
Polato I, Ré R, Goldman A, Kon F (2014) A comprehensive view of hadoop research—a systematic literature review”. J Netw Comput Appl 46(C):1–25
Ouyang X, Garraghan P, Primas B, McKee D, Townend P, Xu J (2018) Adaptive speculation for efficient internetware application execution in clouds. ACM Trans Internet Technol 18:15:1–15:22
Wang N, Yang J, Lu Z, Li X, Wu J (2016) Comparison and improvement of hadoop mapreduce performance prediction models in the private cloud. In: Wang G, Han Y, Martínez Pérez G (eds) Advances in Services Computing: 10th Asia-Pacific Services Computing Conference, APSCC 2016, Zhangjiajie, China, November 16–18, Proceedings. Springer International Publishing, Cham, pp 77–91
Pol VV, Patil SM (2016) Implementation of on-process aggregation for efficient big data processing in Hadoop MapReduce environment. In: 2016 International Conference on Inventive Computation Technologies (ICICT), vol 3, pp 1–5
Sakr S, Liu A, Fayoumi AG (2013) The family of mapreduce and large-scale data processing systems. ACM Comput Surv 46(1):11:1–11:44
Zhang X, Wu Y, Zhao C (2016) MrHeter: improving MapReduce performance in heterogeneous environments. Clust Comput 19(4):1691–1701
Sun X, He C, Lu Y (2012) ESAMR: an enhanced self-adaptive mapreduce scheduling algorithm. In: 2012 IEEE 18th International Conference on Parallel and Distributed Systems, pp 148–155
Giachetta R (2015) Computers & graphics special section on processing large geospatial data a framework for processing large scale geospatial and remote sensing data in mapreduce environment. Comput Graph 49:37–46
Chen Q, Liu C, Xiao Z (2014) Improving mapreduce performance using smart speculative execution strategy. IEEE Trans Comput 63(4):954–967
Sun M, Zhuang H, Li C, Lu K, Zhou X (2016) Scheduling algorithm based on prefetching in mapreduce clusters. Appl Soft Comput 38(C):1109–1118
Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp 1–10
Fan L, Gao B, Zhang F, Liu Z (2014) OS4M: achieving global load balance of mapreduce workload by scheduling at the operation level. CoRR, vol. abs/1406.3
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892
Peng Z, Wang G (2017) An optimal energy-saving real-time task-scheduling algorithm for mobile terminals. Int J Distrib Sens Netw 13(5):1550147717707891
Zaharia M, Konwinski A, Joseph AD, Katz R, Stoica I (2008) Improving mapreduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, pp 29–42
Shen H, Li C (2018) Zeno: a straggler diagnosis system for distributed computing using machine learning. In: High Performance Computing, pp 144–162
Satapathy SC, Raju KS, Mandal JK, Bhateja V (2015) Proceedings of the Second International Conference on Computer and Communication Technologies: IC3T 2015, vol 2. Springer India
Chen Q, Zhang D, Guo M, Deng Q, Guo S (2010) SAMR: a self-adaptive mapreduce scheduling algorithm in heterogeneous environment. In” 2010 10th IEEE International Conference on Computer and Information Technology, pp 2736–2743
Yang G (2011) The application of mapreduce in the cloud computing. In: 2011 2nd International Symposium on Intelligence Information Processing and Trusted Computing, pp 154–156
Li Y, Yang Q, Lai S, Li B (2015) A new speculative execution algorithm based on C4.5 decision tree for hadoop. In: Wang H, Qi H, Che W, Qiu Z, Kong L, Han Z, Lin J, Lu Z (eds) Intelligent Computation in Big Data Era: International Conference of Young Computer Scientists, Engineers and Educators, ICYCSEE 2015, Harbin, China, January 10–12, 2015. Proceedings. Springer, Berlin, pp 284–291
Acknowledgements
This work is supported in part by the National Natural Science Foundation of China under Grants 61632009 & 61472451, in part by the Guangdong Provincial Natural Science Foundation under Grant 2017A030308006 and High-Level Talents Program of Higher Education in Guangdong Province under Grant 2016ZJ01.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article has been retracted. Please see the retraction notice for more detail:https://doi.org/10.1007/s11227-024-06126-5
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Javadpour, A., Wang, G., Rezaei, S. et al. RETRACTED ARTICLE: Detecting straggler MapReduce tasks in big data processing infrastructure by neural network. J Supercomput 76, 6969–6993 (2020). https://doi.org/10.1007/s11227-019-03136-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-019-03136-6