MIRAGE: A consolidation aware migration avoidance genetic job scheduling algorithm for virtualized data centers
Introduction
Data centers have become the essential computing infrastructure to drive the modern digitized economy across all industries. But the fast growing number and the scale of data centers also raise high energy consumption problems. According to the 2016 United States Data Center Energy Usage Report [31], the data centers in U.S. accounted for 70 billion kilowatt hours of electricity consumption in 2014. That is around 1.8% of the total U.S. consumption or equivalent to the amount consumed by about 6.4 million average American homes that year. Hence building green data centers with higher energy efficiency is a necessary mean to address many urgent needs, including environmental protection, operation cost reduction, and computing performance per watt improvement.
One of the key factors for causing low energy efficiency is the inefficient resource utilization of data centers [5]. Due to the dynamic nature of computing workload, data centers provide resources according to the peak demand. As a result, many studies [2] have indicated that the average resource utilization of a data center is only between 10% to 50%, while an idle server could consume energy as high as a 50% fully utilized server [9]. To address this issue, many modern data centers have been virtualized by employing virtualization techniques to host user applications and jobs in a virtual machine (VM). A server virtualization technique, like KVM, allows the resource allocation of a VM to be configured according to the workload, and enables a VM to be migrated freely among physical machines (PMs) at runtime. Hence multiple VMs can be packed on a single PM to let the PM run in a more energy efficient condition, and under-utilized PMs can be turned-off during the off-peak hours for energy saving. This energy management technique is called VM consolidation, and it is an effective and widely used approach to improve the energy efficiency of data centers. However, applying VM consolidation comes with a price that is the negative impacts from VM migrations, such as performance degradation [7], [19] and service disruption [24], [49] of applications, additional resource consumption [34], [41] and the risk of failure to its hosted applications [17]. Therefore, minimizing the overhead of VM migrations during consolidation has drawn increasing interests.
While live migration techniques [11], [15], [24] are available to reduce migration time by minimizing the memory footprint of VM, they cannot avoid VM migrations from happening. Hence, most recent consolidation strategies [16], [18], [32], [39] have aimed to minimize energy (the number of active PMs) and VM migrations cost at the same time. However, all these approaches assume workload variations are unpredictable, so they only attempt to find the best VM placement under the current system workload, and the VM migration cost is modeled as the differences between the VM placements before and after a consolidation action occurs. We call them “reactive” consolidation aware strategies, because their goal is to minimize the migration cost from the current consolidation action without considering how to avoid migrations in the future. As a result, when a VM is migrated to a soon-to-be-inactive PM, the VM might be forced to be migrated again. Therefore, in contrast to previous work, we made a first attempt to propose a proactive approach that can avoid VM migrations over future time intervals.
Indeed, without prior knowledge of the system workload, it is difficult to avoid VM migrations. Hence, in this work, we address our problem on batch processing systems [13], [20], like the HPC (High Performance Computing) systems [19] or in Map Reduce production clusters [37]. These systems often exhibit periodic workload patterns, and the job execution time is given by users during job submission. Therefore, we believe these prior workload information can be used to design a proactive approach for minimizing VM migrations. The intuition of our approach is to schedule a job on a machine that will remain active throughout the job execution time, so possible VM migration can be avoided. To achieve that, we first use a semi-static consolidation strategy to determine the number of active machines according to the periodic workload variation pattern. Then we design a consolidation aware scheduling algorithm that schedules jobs to machines based on the future machine turn-on/off time and remaining job execution time. To the best of our knowledge, we are the first to propose consolidation aware scheduling algorithm for minimizing VM migrations in batch processing systems.
Our main contributions are:
- •
we recognize the importance of co-design between consolidation and scheduling in batch processing systems. With prior knowledge of the machine turn-on/off time, we design a scheduling algorithm to avoid future VM migrations.
- •
our scheduling algorithm aims to minimize the number of VM migrations over future scheduling intervals, while previous approaches only reduce the migration cost of current consolidation action.
- •
we formulate our consolidation aware scheduling problem as an integer programming optimization problem and construct a genetic algorithm to find a near optimal solution. Therefore, our approach is called MIRAGE, which stands for {Mi}g{r}ation {A}voidance {Ge}netic algorithm.
- •
Furthermore in order to demonstrate the efficiency of our approach, we have compared our algorithm with two efficient job scheduling algorithms [8], [46]. Our experimental results show that our algorithm outperforms these scheduling algorithms with 25% fewer VM migrations. Even comparing to our previously proposed consolidation aware greedy scheduling algorithm [38], MIRAGE still can reduce the number of VM migrations by more than 33%.
The remaining paper is constructed as follows. Related work is given in Section 2 and Section 3 describes the consolidation aware scheduling problem and defines our ILP (Integer Linear Programming) formulation. Section 4 introduces our proposed MIRAGE algorithm. The experimental setup and results are presented in Section 5 and Section 6, respectively. Finally the paper is concluded in Section 7.
Section snippets
Migration aware VM consolidation
There are some works that have extended their VM consolidation strategies to be migration aware. For dynamic consolidation strategies, VM migrations are triggered according to the monitoring workload on a machine when the workload is too low or too high. As a result, VM migrations could occur at anytime, but their decisions only need to be made on one machine at a time. Therefore, dynamic consolidation strategies [3], [14], [43] could be easily extended to be migration aware by choosing the
System model
In this work, we consider a consolidation aware job scheduling problem in a virtualized data center for running batch processing jobs which is one of main workloads from many application domains, such as data analytic and scientific computing. We describe how the resource management and energy management are controlled by a job scheduler and a semi-static consolidation strategy in our system model as follows.
A key resource management component in a batch processing system is its job scheduler.
MIRAGE:migration avoidance genetic algorithm
The problem formulated in the previous section is NP-hard. Hence, we construct a genetic algorithm as a heuristic solution to solve this problem. A genetic algorithm is an optimization technique that tries to follow a natural biological process in order to find the strongest individuals from a population of individuals. It is used to find near optimal solutions for complex problems, when we have a large problem space and a greater number of variables. Genetic algorithm initially starts with a
Experimental setup
In this section, we briefly explain our realistic workload traces and synthetic workload. In this paper, we have utilized two real workload traces which are PIK and LLNL Atlas workload. We have even generated a synthetic workload trace by using a publicly released synthetic generator [23]. We show the CDF trace and nature of all the workloads in this section. Let us begin with the explanation of the two real traces.
- •
PIK: The PIK workload [35] is a collection of 3 years of data from Potsdam
Experimental results
In this section, we first analyze the impact of several tuning parameters in our genetic algorithms to find the best setting. Then, we compare the number of migrations between MIRAGE and several previous job scheduling strategies, including the optimal results from an ILP solver. Finally, we show the robustness of MIRAGE by achieving consistent results over various workload patterns.
Conclusion
This paper aims to address the VM migration problem in a virtualized data center. We proposed a novel consolidation-aware job scheduling algorithm called MIRAGE to minimize VM migrations with prior job information. We made the following contributions. First, we proposed the idea of consolidation-aware job scheduling problem, and formally formulated it into an ILP problem. So, an optimal solution could be found using an ILP solver. Second, since the optimal formulation is NP-hard we designed a
CRediT authorship contribution statement
Satyajit Padhy: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data, Writing - original draft, Writing - review & editing. Jerry Chou: Conception and design of study, Analysis and/or interpretation of data, Writing - original draft, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
All authors approved the version of the manuscript to be published.
Satyajit Padhy is a Ph.D. student at Institute of Information System and Applications in National Tsing Hua University (NTHU). He has received his Master’s degree from Amity University, India in 2012. His research interests include distributed and parallel systems, resource management, cloud computing and network function virtualization.
References (50)
- et al.
A genetic algorithm for minimizing total tardiness/earliness of weighted jobs in a batched delivery system
Comput. Ind. Eng.
(2012) - et al.
Heuristics for periodical batch job scheduling in a MapReduce computing framework
Inform. Sci.
(2016) - et al.
The workload on parallel supercomputers: modeling the characteristics of rigid jobs
(2003) - et al.
Sandpiper: Black-box and gray-box resource management for virtual machines
Comput. Netw.
(2009) - et al.
- et al.
The case for energy-proportional computing
Computer
(2007) - et al.
Adaptive threshold-based approach for energy-efficient consolidation of virtual machines in cloud data centers
- et al.
Genetic algorithm with elitist model and its convergence
Int. J. Pattern Recognit. Artif. Intell.
(1996) - et al.
Energy-efficient management of data center resources for cloud computing: A vision, architectural elements, and open challenges
(2010) - et al.
Minimisation of total tardiness for identical parallel machine scheduling using genetic algorithm
Sādhanā
(2016)
Using ant colony system to consolidate VMs for green cloud computing
IEEE Trans. Serv. Comput.
Energy-aware scheduling in virtualized datacenters
Difference engine: Harnessing memory redundancy in virtual machines
Commun. ACM
Entropy: A consolidation manager for clusters
Post-copy live migration of virtual machines
SIGOPS Oper. Syst. Rev.
Mitigating the risk of cloud services downtime using live migration and high availability-aware placement
Dynamic placement for clustered web applications
Deadline aware virtual machine scheduler for grid and cloud computing
Genetic algorithm using the inhomogeneous Markov chain for job shop scheduling problem
Fast transparent migration for virtual machines
Cited by (6)
RAFL: A hybrid metaheuristic based resource allocation framework for load balancing in cloud computing environment
2022, Simulation Modelling Practice and TheoryCitation Excerpt :In the proposed Resource Allocation Framework for Load balancing called RAFL, the virtual machine (VM) placement plan is generated with an objective to minimize load imbalance across active PMs and among their different considered resource capacities. In literature [8–13] it is observed that in most of the frameworks, reactive load balancing approach is used, i.e., the load balancing procedures are executed in reaction to load imbalance. Task and VM migration are the primarily adopted approaches to balance the load which are time and resource intensive [14,15].
Application Scheduling With Multiplexed Sensing of Monitoring Points in Multi-Purpose IoT Wireless Sensor Networks
2024, IEEE Transactions on Network and Service ManagementCloud Datacenter Selection Using Service Broker Policies: A Survey
2024, CMES - Computer Modeling in Engineering and SciencesEdge Testing of Noisy Image Based on Wavelet Neural Network
2023, Automatic Control and Computer SciencesFault tolerance and quality of service aware virtual machine scheduling algorithm in cloud data centers
2023, Journal of Supercomputing
Satyajit Padhy is a Ph.D. student at Institute of Information System and Applications in National Tsing Hua University (NTHU). He has received his Master’s degree from Amity University, India in 2012. His research interests include distributed and parallel systems, resource management, cloud computing and network function virtualization.
Dr. Jerry Chou is an associate professor at Computer Science department in National Tsing Hua University (NTHU) since 2011. Dr. Chou received his Ph.D. degree from Computer Science and Engineering department at UCSD in 2009. Before joining NTHU as a faulty member, Dr. Chou worked in the data management group in Lawrence Berkeley National Lab (LBNL). Dr. Chou’s research interests including high performance computing, cloud computing, data management, and distributed and parallel systems. His work has led to over 40 publications in international conferences and journals, and he has served as the reviewer and program committee member in several high impact journals, including TPDS, JDPC, etc.