Abstract
Energy consumption minimization of cloud data centers (DCs) has attracted much attention from the research community in the recent years; particularly due to the increasing dependence of emerging Cyber-Physical Systems on them. An effective way to improve the energy efficiency of DCs is by using efficient job scheduling strategies. However, the most challenging issue in selection of efficient job scheduling strategy is to ensure service-level agreement (SLA) bindings of the scheduled tasks. Hence, an energy-aware and SLA-driven job scheduling framework based on MapReduce is presented in this article. The primary aim of the proposed framework is to explore task-to-slot/container mapping problem as a special case of energy-aware scheduling in deadline-constrained scenario. Thus, this problem can be viewed as a complex multi-objective problem comprised of different constraints. To address this problem efficiently, it is segregated into three major subproblems (SPs), namely, deadline segregation, map and reduce phase energy-aware scheduling. These SPs are individually formulated using Integer Linear Programming. To solve these SPs effectively, heuristics based on Greedy strategy along with classical Hungarian algorithm for serial and serial-parallel systems are used. Moreover, the proposed scheme also explores the potential of splitting Map/Reduce phase(s) into multiple stages to achieve higher energy reductions. This is achieved by leveraging the concepts of classical Greedy approach and priority queues. The proposed scheme has been validated using real-time data traces acquired from OpenCloud. Moreover, the performance of the proposed scheme is compared with the existing schemes using different evaluation metrics, namely, number of stages, total energy consumption, total makespan, and SLA violated. The results obtained prove the efficacy of the proposed scheme in comparison to the other schemes under different workload scenarios.
- Emerson Network Power. [n.d.]. Energy logic: Reducing data center energy consumption by creating savings that cascade across systems. Emerson Network Power. A White Paper from the Experts in Business-Critical Continuity.Google Scholar
- Carnegie Mellon University. [n.d.]. OpenCloud Hadoop cluster trace: Format and schema. Retrieved from http://ftp.pdl.cmu.edu/pub/datasets/hla/dataset.html.Google Scholar
- Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. 2011. Disk-locality in datacenter computing considered irrelevant. In Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS’11), Vol. 13. 12–12. Google ScholarDigital Library
- Gagangeet Singh Aujla, Anish Jindal, Neeraj Kumar, and Mukesh Singh. 2016. SDN-based data center energy management system using RES and electric vehicles. In Proceedings of the IEEE Global Communications Conference (GLOBECOM’16).Google ScholarCross Ref
- Xiangping Bu, Jia Rao, and Cheng-zhong Xu. 2013. Interference and locality-aware task scheduling for MapReduce applications in virtual clusters. In Proceedings of the 22nd International Symposium on High-performance Parallel and Distributed Computing. ACM, 227–238. Google ScholarDigital Library
- Hyunseok Chang, Murali Kodialam, Ramana Rao Kompella, T. V. Lakshman, Myungjin Lee, and Sarit Mukherjee. 2011. Scheduling in mapreduce-like systems for fast completion time. In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM’11). IEEE, 3074–3082.Google ScholarCross Ref
- Yanpei Chen, Sara Alspaugh, Dhruba Borthakur, and Randy Katz. 2012. Energy efficiency for large-scale mapreduce workloads with significant interactive analysis. In Proceedings of the 7th ACM European Conference on Computer Systems. ACM, 43–56. Google ScholarDigital Library
- Zheyi Chen, Jia Hu, Geyong Min, Albert Y. Zomaya, and Tarek El-Ghazawi. 2019. Towards accurate prediction for high-dimensional and highly-variable cloud workloads with deep learning. IEEE Trans. Parallel Distrib. Syst. 31, 4 (2019), 923–934.Google ScholarDigital Library
- Dazhao Cheng, Jia Rao, Yanfei Guo, Changjun Jiang, and Xiaobo Zhou. 2017. Improving performance of heterogeneous mapreduce clusters with adaptive task tuning. IEEE Trans. Parallel Distrib. Syst. 28, 3 (2017), 774–786. Google ScholarDigital Library
- Gary Cook. 2012. How clean is your cloud? Catalysing an energy revolution. Greenpeace Int. (2012). https://www.greenpeace.org/static/planet4-international-stateless/2012/04/e7c8ff21-howcleanisyourcloud.pdf.Google Scholar
- Miyuru Dayarathna, Yonggang Wen, and Rui Fan. 2016. Data center energy consumption modeling: A survey. IEEE Commun. Surveys Tutor. 18, 1 (2016), 732–794.Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107–113. Google ScholarDigital Library
- Mansi S. Gaglani. 2011. A Study on Transportation Problem, Transshipment Problem, Assignment Problem and Supply Chain Management. Ph.D. Dissertation. Saurashtra University.Google Scholar
- Sahil Garg, Kuljeet Kaur, Neeraj Kumar, Shalini Batra, and Mohammad S. Obaidat. 2018. HyClass: Hybrid classification model for anomaly detection in cloud environment. In Proceedings of the IEEE International Conference on Communications (ICC’18).Google Scholar
- S. Garg, K. Kaur, N. Kumar, G. Kaddoum, A. Y. Zomaya, and R. Ranjan. 2019. A hybrid deep learning based model for anomaly detection in cloud datacentre networks. IEEE Trans. Netw. Service Manage. 16, 3 (2019), 924--35. DOI:10.1109/TNSM.2019.2927886Google ScholarCross Ref
- Íñigo Goiri, Kien Le, Thu D. Nguyen, Jordi Guitart, Jordi Torres, and Ricardo Bianchini. 2012. GreenHadoop: Leveraging green energy in data-processing frameworks. In Proceedings of the 7th ACM European Conference on Computer Systems. ACM, 57–70. Google ScholarDigital Library
- Make IT Green. 2010. Cloud computing and its contribution to climate change. Greenpeace Int. (2010). https://www.greenpeace.org/static/planet4-international-stateless/2010/03/f2954209-make-it-green-cloud-computing.pdf.Google Scholar
- James Hamilton. 2009. Cooperative expendable micro-slice servers (CEMS): Low cost, low power servers for internet-scale services. In Proceedings of the Conference on Innovative Data Systems Research (CIDR’09). Citeseer.Google Scholar
- Shadi Ibrahim, Hai Jin, Lu Lu, Bingsheng He, Gabriel Antoniu, and Song Wu. 2012. Maestro: Replica-aware map scheduling for mapreduce. In Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’12). IEEE, 435–442. Google ScholarDigital Library
- Kuljeet Kaur, Sahil Garg, Neeraj Kumar, Gagangeet Singh Aujla, Kim Kwang Raymond Choo, and Mohammad S. Obaidat. 2019. An adaptive grid frequency support mechanism for energy management in cloud data centers. IEEE Syst. J. 14, 1 (2019), 1195--205. DOI:10.1109/JSYST.2019.2921592Google ScholarCross Ref
- Kujeet Kaur, Neeraj Kumar, Sahil Garg, and Joel J. P. C. Rodrigues. 2018. EnLoc: Data locality-aware energy-efficient scheduling scheme for cloud data centers. In Proceedings of the IEEE International Conference on Communications (ICC’18).Google Scholar
- Neeraj Kumar, Gagangeet Singh Aujla, Sahil Garg, Kuljeet Kaur, Rajiv Ranjan, and Saurabh Kumar Garg. 2018. Renewable energy-based multi-indexed job classification and container management scheme for sustainability of cloud data centers. IEEE Trans. Industr. Inform. 15, 5 (2018), 2947–2957.Google ScholarCross Ref
- Willis Lang and Jignesh M. Patel. 2010. Energy management for MapReduce clusters. Proc. VLDB Endow. 3, 1–2 (Sept. 2010), 129–139. DOI:https://doi.org/10.14778/1920841.1920862 Google ScholarDigital Library
- Jacob Leverich and Christos Kozyrakis. 2010. On the energy (in) efficiency of hadoop clusters. ACM SIGOPS Operat. Syst. Rev. 44, 1 (2010), 61–65. Google ScholarDigital Library
- Tingpeng Li, Yue Li, and Yanling Qian. 2016. Improved Hungarian algorithm for assignment problems of serial-parallel systems. J. Syst. Eng. Electr. 27, 4 (2016), 858–870.Google ScholarCross Ref
- Lena Mashayekhy, Mahyar Movahed Nejad, Daniel Grosu, Quan Zhang, and Weisong Shi. 2015. Energy-aware scheduling of mapreduce jobs for big data applications. IEEE Trans. Parallel Distrib. Syst. 26, 10 (2015), 2720–2733. Google ScholarDigital Library
- Benjamin Moseley, Anirban Dasgupta, Ravi Kumar, and Tamás Sarlós. 2011. On scheduling in map-reduce and flow-shops. In Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures. ACM, 289–298. Google ScholarDigital Library
- Radheshyam Nanduri, Nitesh Maheshwari, A. Reddyraja, and Vasudeva Varma. 2011. Job aware scheduling algorithm for mapreduce framework. In Proceedings of the IEEE 3rd International Conference on Cloud Computing Technology and Science (CloudCom’11). IEEE, 724–729. Google ScholarDigital Library
- Mario Pastorelli, Antonio Barbuzzi, Damiano Carra, Matteo Dell’Amico, and Pietro Michiardi. 2013. HFSP: Size-based scheduling for hadoop. In Proceedings of the IEEE International Conference on Big Data (BigData’13). IEEE, 51–59.Google ScholarCross Ref
- Zujie Ren, Jian Wan, Weisong Shi, Xianghua Xu, and Min Zhou. 2014. Workload analysis, implications, and optimization on a production hadoop cluster: A case study on taobao. IEEE Trans. Services Comput. 7, 2 (2014), 307–321.Google ScholarCross Ref
- Thomas Sandholm and Kevin Lai. 2010. Dynamic proportional share scheduling in hadoop. In Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing. Springer, 110–131. Google ScholarDigital Library
- Amritpal Singh, Sahil Garg, Kuljeet Kaur, Shalini Batra, Neeraj Kumar, and Kim-Kwang Raymond Choo. 2018. Fuzzy-folded bloom filter-as-a-service for big data storage in the cloud. IEEE Trans. Industr. Inform. 15, 4 (2018), 2338–2348.Google ScholarCross Ref
- Jie Song, Xuebing Liu, Zhiliang Zhu, Dazhe Zhao, and Ge Yu. 2014. A novel task scheduling approach for reducing energy consumption of mapreduce cluster. IETE Techn. Rev. 31, 1 (2014), 65–74.Google ScholarCross Ref
- Morgan Tatchell-Evans, Nik Kapur, Jonathan Summers, Harvey Thompson, and Dan Oldham. 2017. An experimental and theoretical investigation of the extent of bypass air within data centres employing aisle containment, and its impact on power consumption. Appl. Energy 186 (2017), 457–469.Google ScholarCross Ref
- Abhishek Verma, Ludmila Cherkasova, and Roy H. Campbell. 2012. Two sides of a coin: Optimizing the schedule of mapreduce jobs to minimize their makespan and improve cluster performance. In Proceedings of the IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems. IEEE, 11–18. Google ScholarDigital Library
- Xite Wang, Derong Shen, Ge Yu, Tiezheng Nie, and Yue Kou. 2013. A throughput driven task scheduler for improving mapreduce performance in job-intensive environments. In Proceedings of the IEEE International Congress on Big Data (BigData’13). IEEE, 211–218. Google ScholarDigital Library
- Xiaoli Wang, Yuping Wang, and Yue Cui. 2016. An energy-aware bi-level optimization model for multi-job scheduling problems under cloud computing. Soft Comput. 20, 1 (2016), 303–317. Google ScholarDigital Library
- Tom White. 2012. Hadoop: The Definitive Guide. O’Reilly Media. Google ScholarDigital Library
- Joel Wolf, Deepak Rajan, Kirsten Hildrum, Rohit Khandekar, Vibhore Kumar, Sujay Parekh, Kun-Lung Wu, and Andrey Balmin. 2010. Flex: A slot allocation scheduling optimizer for mapreduce workloads. In Proceedings of the ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing. Springer, 1–20. Google ScholarDigital Library
- Matei Zaharia, Dhruba Borthakur, J. Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2009. Job scheduling for multi-user mapreduce clusters. EECS Department, University of California, Berkeley, Technical Report No. UCB/EECS-2009-55.Google Scholar
- Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy H. Katz, and Ion Stoica. 2008. Improving MapReduce performance in heterogeneous environments. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’08), Vol. 8, 7. Google ScholarDigital Library
- Marina Zapater, José L. Risco-Martín, Patricia Arroba, José L. Ayala, José M. Moya, and Román Hermida. 2016. Runtime data center temperature prediction using Grammatical Evolution techniques. Appl. Soft Comput. 49 (2016), 94–107. Google ScholarDigital Library
Index Terms
- Energy and SLA-driven MapReduce Job Scheduling Framework for Cloud-based Cyber-Physical Systems
Recommendations
Online Flexible Job Scheduling for Minimum Span
SPAA '17: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and ArchitecturesIn this paper, we study an online Flexible Job Scheduling (FJS) problem. The input of the problem is a set of jobs, each having an arrival time, a starting deadline and a processing length. Each job has to be started by the scheduler between its arrival ...
Job scheduling to minimize the weighted waiting time variance of jobs
This study considers the job scheduling problem of minimizing the weighted waiting time variance (WWTV) of jobs. It is an extension of WTV minimization problems in which we schedule a batch of n jobs, for servicing on a single resource, in such a way ...
An Improved Job Scheduling Algorithm by Utilizing Released Resources for MapReduce
EAIT '14: Proceedings of the 2014 Fourth International Conference of Emerging Applications of Information TechnologyMapReduce has become one standard for big data processing in Cloud computing environment. However job scheduling in this model is always a challenge for the research fraternity and several job scheduling algorithms have already been proposed by ...
Comments