Expelliarmus: Semantic-centric virtual machine image management in IaaS Clouds

https://doi.org/10.1016/j.jpdc.2020.08.001Get rights and content
Under a Creative Commons license
open access

Highlights

  • A novel semantic model representing VMIs as structured graphs.

  • VMI clustering based on functionality with low similarity computation overheads.

  • A semantics-aware VMI decomposition method without costly content deduplication.

  • Optimized VMI assembly with compatible base image and selective package retrieval.

  • Reduced repository size upto 22 times with improved VMI publishing and retrieval.

  • Scalability analysis for increase in repository size and VMI retrieval performance.

Abstract

Infrastructure-as-a-service (IaaS) Clouds concurrently accommodate diverse sets of user requests, requiring an efficient strategy for storing and retrieving virtual machine images (VMIs) at a large scale. The VMI storage management requires dealing with multiple VMIs, typically in the magnitude of gigabytes, which entails VMI sprawl issues hindering the elastic resource management and provisioning. Unfortunately, existing techniques to facilitate VMI management overlook VMI semantics (i.e at the level of base image and software packages), with either restricted possibility to identify and extract reusable functionalities or with higher VMI publishing and retrieval overheads. In this paper, we propose Expelliarmus, a novel VMI management system that helps to minimize VMI storage, publishing and retrieval overheads. To achieve this goal, Expelliarmus incorporates three complementary features. First, it models VMIs as semantic graphs to facilitate their similarity computation. Second, it provides a semantically-aware VMI decomposition and base image selection to extract and store non-redundant base image and software packages. Third, it assembles VMIs based on the required software packages upon user request. We evaluate Expelliarmus through a representative set of synthetic Cloud VMIs on a real test-bed. Experimental results show that our semantic-centric approach is able to optimize the repository size by 2.322 times compared to state-of-the-art systems (e.g. IBM’s Mirage and Hemera) with significant VMI publishing and slight retrieval performance improvement.

Keywords

Virtual machine image management
Semantic similarity
Storage optimization
Virtual machine image publishing
Virtual machine image retrieval

Cited by (0)

Nishant Saurabh is a research assistant at University of Klagenfurt, Austria and pursuing his Ph.D. at University of Innsbruck, Austria since 2015. Prior to start of Ph.D., he obtained his MSc. degree specialized in High Performance Distributed Computing from Vrije Universiteit, Netherlands; and received his B.E degree specialized in computer science and engineering from PRMITR affiliated to SGBAU University, India. His research areas include resource management and placement optimization in the field of parallel and distributed systems.

Shajulin Benedict graduated from Manonmaniam Sunderanar University, India, in 2001; received his M.E Degree in Digital Communication and Computer Networking from A.K.C.E, Anna University, India in 2004; and got his Ph.D degree in the area of Grid scheduling under Anna University, India. He served as Professor at SXCCE Research Centre of Anna University, India. Currently, he works at the Indian Institute of Information Technology Kottayam, Kerala, India. His research interests include compilers, HPC, Cloud, Grid scheduling and performance analysis of exascale applications.

Jorge G. Barbosa received his BSc degree in Electrical and Computer Engineering from Faculty of Engineering of the University of Porto (FEUP), Portugal; MSc in Digital Systems from University of Manchester Institute of Science and Technology, England, in 1993, and PhD in Electrical and Computer Engineering from FEUP, Portugal, in 2001. Since 2001 he is an Assistant Professor at FEUP. His research interests include parallel and distributed computing, heterogeneous computing, scheduling in heterogeneous environments and cloud computing.

Radu Prodan is professor in distributed systems at the Institute of Software Technology, University of Klagenfurt. He received his PhD in 2004 from the Vienna University of Technology and was Associate Professor until 2018 at the University of Innsbruck, Austria. His research interests include performance, optimization, and resource management tools for parallel and distributed applications. He authored over 100 publications and received two IEEE best paper awards.