MIRAGE: A consolidation aware migration avoidance genetic job scheduling algorithm for virtualized data centers

https://doi.org/10.1016/j.jpdc.2021.03.004Get rights and content

Highlights

  • Proactively avoid VM migration in virtualized data centers.

  • A co-design between consolidation and scheduling decisions.

  • Construct an ILP formulation to find an optimal solution.

  • Apply genetic algorithm to find a heuristic solution.

  • Using real world workload traces from parallel computing cluster for experimentation.

Abstract

Modern virtualized data centers often rely on virtual machine (VM) migrations to consolidate workload on a single machine for energy saving. But VM migrations have many drawbacks, including performance degradation, service disruption etc. Hence, many approaches have been proposed to minimize the overhead when migrations occur. In contrast, this work aims to proactively avoid migrations from happening in the first place. We have proposed a novel consolidation aware scheduling algorithm to minimize the number of migrations for batch processing systems by taking advantage of the prior knowledge of consolidation strategy and job information. We show the problem can be formulated as an integer linear programming (ILP) problem, and an effective heuristic solution can be found by a genetic algorithm. Both real and synthetic workload traces were used to evaluate our methods. Experimental results showed that, after comparing with two popular job scheduling algorithms, our approach has reduced the number of migrations by more than 25%.

Introduction

Data centers have become the essential computing infrastructure to drive the modern digitized economy across all industries. But the fast growing number and the scale of data centers also raise high energy consumption problems. According to the 2016 United States Data Center Energy Usage Report [31], the data centers in U.S. accounted for 70 billion kilowatt hours of electricity consumption in 2014. That is around 1.8% of the total U.S. consumption or equivalent to the amount consumed by about 6.4 million average American homes that year. Hence building green data centers with higher energy efficiency is a necessary mean to address many urgent needs, including environmental protection, operation cost reduction, and computing performance per watt improvement.

One of the key factors for causing low energy efficiency is the inefficient resource utilization of data centers [5]. Due to the dynamic nature of computing workload, data centers provide resources according to the peak demand. As a result, many studies [2] have indicated that the average resource utilization of a data center is only between 10% to 50%, while an idle server could consume energy as high as a 50% fully utilized server [9]. To address this issue, many modern data centers have been virtualized by employing virtualization techniques to host user applications and jobs in a virtual machine (VM). A server virtualization technique, like KVM, allows the resource allocation of a VM to be configured according to the workload, and enables a VM to be migrated freely among physical machines (PMs) at runtime. Hence multiple VMs can be packed on a single PM to let the PM run in a more energy efficient condition, and under-utilized PMs can be turned-off during the off-peak hours for energy saving. This energy management technique is called VM consolidation, and it is an effective and widely used approach to improve the energy efficiency of data centers. However, applying VM consolidation comes with a price that is the negative impacts from VM migrations, such as performance degradation [7], [19] and service disruption [24], [49] of applications, additional resource consumption [34], [41] and the risk of failure to its hosted applications [17]. Therefore, minimizing the overhead of VM migrations during consolidation has drawn increasing interests.

While live migration techniques [11], [15], [24] are available to reduce migration time by minimizing the memory footprint of VM, they cannot avoid VM migrations from happening. Hence, most recent consolidation strategies [16], [18], [32], [39] have aimed to minimize energy (the number of active PMs) and VM migrations cost at the same time. However, all these approaches assume workload variations are unpredictable, so they only attempt to find the best VM placement under the current system workload, and the VM migration cost is modeled as the differences between the VM placements before and after a consolidation action occurs. We call them “reactive” consolidation aware strategies, because their goal is to minimize the migration cost from the current consolidation action without considering how to avoid migrations in the future. As a result, when a VM is migrated to a soon-to-be-inactive PM, the VM might be forced to be migrated again. Therefore, in contrast to previous work, we made a first attempt to propose a proactive approach that can avoid VM migrations over future time intervals.

Indeed, without prior knowledge of the system workload, it is difficult to avoid VM migrations. Hence, in this work, we address our problem on batch processing systems [13], [20], like the HPC (High Performance Computing) systems [19] or in Map Reduce production clusters [37]. These systems often exhibit periodic workload patterns, and the job execution time is given by users during job submission. Therefore, we believe these prior workload information can be used to design a proactive approach for minimizing VM migrations. The intuition of our approach is to schedule a job on a machine that will remain active throughout the job execution time, so possible VM migration can be avoided. To achieve that, we first use a semi-static consolidation strategy to determine the number of active machines according to the periodic workload variation pattern. Then we design a consolidation aware scheduling algorithm that schedules jobs to machines based on the future machine turn-on/off time and remaining job execution time. To the best of our knowledge, we are the first to propose consolidation aware scheduling algorithm for minimizing VM migrations in batch processing systems.

Our main contributions are:

  • we recognize the importance of co-design between consolidation and scheduling in batch processing systems. With prior knowledge of the machine turn-on/off time, we design a scheduling algorithm to avoid future VM migrations.

  • our scheduling algorithm aims to minimize the number of VM migrations over future scheduling intervals, while previous approaches only reduce the migration cost of current consolidation action.

  • we formulate our consolidation aware scheduling problem as an integer programming optimization problem and construct a genetic algorithm to find a near optimal solution. Therefore, our approach is called MIRAGE, which stands for {Mi}g{r}ation {A}voidance {Ge}netic algorithm.

  • Furthermore in order to demonstrate the efficiency of our approach, we have compared our algorithm with two efficient job scheduling algorithms [8], [46]. Our experimental results show that our algorithm outperforms these scheduling algorithms with 25% fewer VM migrations. Even comparing to our previously proposed consolidation aware greedy scheduling algorithm [38], MIRAGE still can reduce the number of VM migrations by more than 33%.

The remaining paper is constructed as follows. Related work is given in Section 2 and Section 3 describes the consolidation aware scheduling problem and defines our ILP (Integer Linear Programming) formulation. Section 4 introduces our proposed MIRAGE algorithm. The experimental setup and results are presented in Section 5 and Section 6, respectively. Finally the paper is concluded in Section 7.

Section snippets

Migration aware VM consolidation

There are some works that have extended their VM consolidation strategies to be migration aware. For dynamic consolidation strategies, VM migrations are triggered according to the monitoring workload on a machine when the workload is too low or too high. As a result, VM migrations could occur at anytime, but their decisions only need to be made on one machine at a time. Therefore, dynamic consolidation strategies [3], [14], [43] could be easily extended to be migration aware by choosing the

System model

In this work, we consider a consolidation aware job scheduling problem in a virtualized data center for running batch processing jobs which is one of main workloads from many application domains, such as data analytic and scientific computing. We describe how the resource management and energy management are controlled by a job scheduler and a semi-static consolidation strategy in our system model as follows.

A key resource management component in a batch processing system is its job scheduler.

MIRAGE:migration avoidance genetic algorithm

The problem formulated in the previous section is NP-hard. Hence, we construct a genetic algorithm as a heuristic solution to solve this problem. A genetic algorithm is an optimization technique that tries to follow a natural biological process in order to find the strongest individuals from a population of individuals. It is used to find near optimal solutions for complex problems, when we have a large problem space and a greater number of variables. Genetic algorithm initially starts with a

Experimental setup

In this section, we briefly explain our realistic workload traces and synthetic workload. In this paper, we have utilized two real workload traces which are PIK and LLNL Atlas workload. We have even generated a synthetic workload trace by using a publicly released synthetic generator [23]. We show the CDF trace and nature of all the workloads in this section. Let us begin with the explanation of the two real traces.

  • PIK: The PIK workload [35] is a collection of 3 years of data from Potsdam

Experimental results

In this section, we first analyze the impact of several tuning parameters in our genetic algorithms to find the best setting. Then, we compare the number of migrations between MIRAGE and several previous job scheduling strategies, including the optimal results from an ILP solver. Finally, we show the robustness of MIRAGE by achieving consistent results over various workload patterns.

Conclusion

This paper aims to address the VM migration problem in a virtualized data center. We proposed a novel consolidation-aware job scheduling algorithm called MIRAGE to minimize VM migrations with prior job information. We made the following contributions. First, we proposed the idea of consolidation-aware job scheduling problem, and formally formulated it into an ILP problem. So, an optimal solution could be found using an ILP solver. Second, since the optimal formulation is NP-hard we designed a

CRediT authorship contribution statement

Satyajit Padhy: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data, Writing - original draft, Writing - review & editing. Jerry Chou: Conception and design of study, Analysis and/or interpretation of data, Writing - original draft, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

All authors approved the version of the manuscript to be published.

Satyajit Padhy is a Ph.D. student at Institute of Information System and Applications in National Tsing Hua University (NTHU). He has received his Master’s degree from Amity University, India in 2012. His research interests include distributed and parallel systems, resource management, cloud computing and network function virtualization.

References (50)

  • ClarkC. et al.
  • FarahnakianF. et al.

    Using ant colony system to consolidate VMs for green cloud computing

    IEEE Trans. Serv. Comput.

    (2015)
  • A. Gandhi, M. Harchol-Balter, R. Das, C. Lefurgy, Optimal power allocation in server farms, in: Proceedings of the...
  • GoiriI. et al.

    Energy-aware scheduling in virtualized datacenters

  • GuptaD. et al.

    Difference engine: Harnessing memory redundancy in virtual machines

    Commun. ACM

    (2010)
  • M. Harvan, T. Locher, A.C. Sima, Cyclone: Unified stream and batch processing, in: 2016 45th International Conference...
  • HermenierF. et al.

    Entropy: A consolidation manager for clusters

  • HinesM.R. et al.

    Post-copy live migration of virtual machines

    SIGOPS Oper. Syst. Rev.

    (2009)
  • HossainM. et al.
  • JammalM. et al.

    Mitigating the risk of cloud services downtime using live migration and high availability-aware placement

  • KarveA. et al.

    Dynamic placement for clustered web applications

  • KhalidO. et al.

    Deadline aware virtual machine scheduler for grid and cloud computing

  • LiJ. et al.

    Genetic algorithm using the inhomogeneous Markov chain for job shop scheduling problem

    (2015)
  • LiuH. et al.
  • NelsonM. et al.

    Fast transparent migration for virtual machines

  • Cited by (6)

    Satyajit Padhy is a Ph.D. student at Institute of Information System and Applications in National Tsing Hua University (NTHU). He has received his Master’s degree from Amity University, India in 2012. His research interests include distributed and parallel systems, resource management, cloud computing and network function virtualization.

    Dr. Jerry Chou is an associate professor at Computer Science department in National Tsing Hua University (NTHU) since 2011. Dr. Chou received his Ph.D. degree from Computer Science and Engineering department at UCSD in 2009. Before joining NTHU as a faulty member, Dr. Chou worked in the data management group in Lawrence Berkeley National Lab (LBNL). Dr. Chou’s research interests including high performance computing, cloud computing, data management, and distributed and parallel systems. His work has led to over 40 publications in international conferences and journals, and he has served as the reviewer and program committee member in several high impact journals, including TPDS, JDPC, etc.

    View full text