Skip to main content
Log in

A dynamic VM provisioning and de-provisioning based cost-efficient deadline-aware scheduling algorithm for Big Data workflow applications in a cloud environment

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Cloud computing is the fastest growing distributed computing paradigm that provides online IT resources on demand by following a pay-as-you-go billing model. The success of this computing paradigm enables cloud providers to offer an extensive collection of parallel computing resources to deal with Big Data workflow scheduling problems. Although, workflow scheduling has been extensively studied, however, most of them are unable to achieve user-specified deadline constraints at the cheap cost. In this paper, a Dynamic Cost-Efficient Deadline-Aware (DCEDA) heuristic algorithm is proposed for scheduling Big Data workflow that produces the cheapest schedule while achieving the deadline constraints. DCEDA dynamically takes appropriate scheduling decisions for workflow tasks based on the fact that deadline constraint is not violated in the future. Also, it continuously monitors the VM pool for identifying the active idle VMs that incur extra costs and overheads, and subsequently de-provision them. The experimental analysis based on Montage workflow and randomly generated synthetic workflow with various characteristics prove that DCEDA delivers better performance in comparison to the existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Sagiroglu, S., Sinanc, D.: Big data: a review. In: IEEE international conference on collaboration technologies and systems (CTS), pp. 42–47 (2013)

  2. Berman, F., Casanova, H., Chien, A., Cooper, K., Dail, H., Dasgupta, A., Deng, W., Dongarra, J., Johnsson, L., Kennedy, K., Koelbel, C.: New grid scheduling and rescheduling methods in the GrADS project. Int. J. Parallel Prog. 33(2/3), 209–229 (2005)

    Article  Google Scholar 

  3. Deelman, E., Singh, G., Su, M.H., Blythe, J., Gil, Y., Kesselman, C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005)

    Google Scholar 

  4. Wieczorek, M., Prodan, R., Fahringer, T.: Scheduling of scientific workflows in the ASKALON grid environment. ACM SIG MOD Rec. 34(3), 56–62 (2005)

    Article  Google Scholar 

  5. Hoffa, C., Mehta, G., Freeman, T., Deelman, E., Keahey, K., Berriman, B., Good, J.: On the use of cloud computing for scientific workflows. In: IEEE 4th international conference on eScience, pp. 640–645 (2008)

  6. Juve, G., Deelman, E., Vahi, K., Mehta, G., Berriman, B., Berman, B. P., Maechling, P.: Scientific workflow applications on Amazon EC2.In: 5th IEEE international conference on E-Science workshops, pp. 59–66 (2009)

  7. Ostermann, S., Iosup, A., Yigitbasi, N., Prodan, R., Fahringer, T., Epema, D.: A performance analysis of EC2 cloud computing services for scientific computing. In: 1st International conference on cloud computing. Springer, Berlin, pp. 115–131 (2010)

  8. Deelman, E., Singh, G., Livny, M., Berriman, B., Good, J.: The cost of doing science on the cloud: the montage example. In: ACM/IEEE conference on supercomputing, pp. 1–12 (2008)

  9. Amazon Web Services (AWS), https://aws.amazon.com. Accessed 28 April 2019

  10. GoGrid. https://www.gogrid.com. Accessed 28 April 2019

  11. Rackspace Cloud. https://www.rackspace.com. Accessed 28 April 2019

  12. Kwok, Y.K., Ahmad, L.: Dynamic critical-path scheduling: an effective technique for allocating task graphs to multiprocessors. IEEE Trans. Parallel Distrib. Syst. 7(5), 506–521 (1996)

    Article  Google Scholar 

  13. Chang, S.F., Messerschmitt, D.G.: Designing high-throughput VLC decoder. I. Concurrent VLSI architectures. IEEE Trans. Circuits Syst. Video Technol. 2(2), 187–196 (1992)

    Article  Google Scholar 

  14. Xiong, N., Vasilakos, A.V., Yang, L.T., Song, L., Pan, Y., Kannan, R., Li, Y.: Comparative analysis of quality of service and memory usage for adaptive failure detectors in healthcare systems. IEEE J. Sel. Areas Commun. 27(4), 495–509 (2009)

    Article  Google Scholar 

  15. Lin, B., Guo, W., Chen, G., Xiong, N., Li, R.: Cost-driven scheduling for deadline-constrained workflow on multi-clouds. In: IEEE parallel and distributed processing symposium workshop (IPDPSW), pp. 1191–1198 (2015)

  16. Sousa, T., Silva, A., Neves, A.: Particle swarm based data mining algorithms for classification tasks. Parallel Comput. 30(5), 767–783 (2004)

    Article  Google Scholar 

  17. Pan, Z., Zhang, Y., Kwong, S.: Efficient motion and disparity estimation optimization for low complexity multiview video coding. IEEE Trans. Broadcast. 61(2), 166–176 (2015)

    Article  Google Scholar 

  18. Juve, G., Deelman, E., Vahi, K., Mehta, G., Berriman, B., Berman, B. P., Maechling, P.: Scientific workflow applications on Amazon EC2. In: 5th IEEE international conference on e-science workshops, pp. 59–66 (2009)

  19. Juve, G., Deelman, E.: Scientific Workflows in the Cloud. Grids Clouds and Virtualization, pp. 71–91. Springer, London (2011)

    Book  Google Scholar 

  20. Schad, J., Dittrich, J., Quiane-Ruiz, J.A.: Runtime measurements in the cloud: observing, analyzing, and reducing variance. Proc. VLDB Endoment 3(1–2), 460–471 (2010)

    Article  Google Scholar 

  21. Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I.: Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Future Gener. Comput. Syst. 25(6), 599–616 (2009)

    Article  Google Scholar 

  22. Sahni, J., Vidyarthi, D.P.: A cost-effective deadline-constrained dynamic scheduling algorithm for scientific workflows in a cloud environment. IEEE Trans. Cloud Comput. 6(1), 2–18 (2018)

    Article  Google Scholar 

  23. Ahmad, W., Alam, B., Malik, S.: Performance analysis of list scheduling algorithms by random synthetic DAGs. In: 2nd International conference on advanced computing and software engineering (ICACSE), pp. 25–32 (2019)

  24. Poola, D., Garg, S. K., Buyya, R., Yang, Y., Ramamohanarao, K.: Robust scheduling of scientific workflows with deadline and budget constraints in clouds. In: IEEE 28th international conference on advance information networking and applications, pp. 858–865 (2014)

  25. Altmann, J., Kashef, M.M.: Cost model based service placement in federated hybrid clouds. Future Gener. Comput. Syst. 41, 79–90 (2014)

    Article  Google Scholar 

  26. McGough, A.S., Forshaw, M., Gerrard, C., Wheater, S., Allen, B., Robinson, P.: Comparison of a cost-effective virtual cloud cluster with an existing campus cluster. Future Gener. Comput. Syst. 41, 65–78 (2014)

    Article  Google Scholar 

  27. Wang, Q., Tan, M. M., Tang, X., Cai, W.: Minimizing cost in IaaS clouds via scheduled instance reservation. In: IEEE 37th international conference on distributed computing system (ICDCS), pp. 1565–1574 (2017)

  28. Mao, M., Humphrey, M.: A performance study on the VM startup time in the cloud. In: IEEE 5th international conference on cloud computing. pp. 423–430 (2012)

  29. Abrishami, S., Naghibzadeh, M., Epema, D.H.: Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds. Future Gener. Comput. Syst. 29(1), 158–169 (2013)

    Article  Google Scholar 

  30. Arabnejad, V., Bubendorfer, K., Ng, B.: Scheduling deadline constrained scientific workflow scheduling on dynamically provisioned cloud resources. Future Gener. Comput. Syst. 75, 348–364 (2017)

    Article  Google Scholar 

  31. Zheng, W., Qin, Y., Bugingo, E., Zhang, D., Chen, J.: Cost optimization for deadline-aware scheduling of big-data processing jobs on clouds. Future Gener. Comput. Syst. 82, 244–255 (2018)

    Article  Google Scholar 

  32. Meena, J., Kumar, M., Vardhan, M.: Cost effective genetic algorithm for workflow scheduling in cloud under deadline constraint. IEEE Access 4, 5065–5082 (2016)

    Article  Google Scholar 

  33. Verma, A., Kaushal, S.: Deadline constraint heuristic-based genetic algorithm for workflow scheduling in cloud. Int. J. Grid Util. Comput. 5(2), 96–106 (2014)

    Article  Google Scholar 

  34. Rodriguez, M.A., Buyya, R.: Deadline based resource provisioning and scheduling algorithm for scientific workflows on clouds. IEEE Trans. Cloud Comput. 2(2), 222–235 (2014)

    Article  Google Scholar 

  35. Haidri, R.A., Katti, C.P., Saxena, P.C.: Cost effective deadline aware scheduling strategy for workflow applications on virtual machines in cloud computing. J. King Saud Univ-Comput. Inf. Sci. (2017). https://doi.org/10.1016/j.jksuci.2017.10.009

    Article  Google Scholar 

  36. Amazon elastic compute cloud (Amazon EC2). https://aws.amazon.com/ec2/. Accessed 28 April 2019

  37. Amazon elastic block store (Amazon EBS). https://aws.amazon.com/ebs/. Accessed 28 April 2019

  38. Cloud Sigma services, https://clouds.geant.org/cloud-sigma/. Accessed 28 April 2019

  39. Google Cloud Platform. https://cloud.google.com/compute/. Accessed 28 April 2019

  40. Saifullah, A., Ferry, D., Lu, C., Gill, C.: Real-time scheduling of parallel tasks under a general dag model. Report Number: WUCSE-2012–14 (2012)

  41. Suter, F.: A synthetic task graph generator. https://github.com/frs69wq/daggen. Accessed 16 June 2019

  42. Montage: an astronomical image mosaic engine. https://montage.ipac.caltech.edu. Accessed 5 Feb 2019

  43. Topcuoglu, H., Hariri, S., Wu, M.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parall. Distrib. Syst. 13(3), 260–274 (2002)

    Article  Google Scholar 

  44. Juve, G., Chervenak, A., Deelman, E., Bharathi, S., Mehta, G., Vahi, K.: Characterizing and profiling scientific workflows. Future Gener. Comput. Syst. 29(3), 682–692 (2013)

    Article  Google Scholar 

  45. Bharathi, S., Chervenak, A., Deelman, E., Mehta, G., Su, M.H., Vahi, K.: Characterization of scientific workflows. In: IEEE 3rd workshop on workflows support large-scale science, pp. 1–10 (2008)

  46. Workflow Generator. https://confluence.pegasus.isi.edu/display/pegasus/WorkflowGenerator. Accessed 28 May 2019

  47. Durillo, J. J., Fard, H. M., Prodan, R.: MOHEFT: a multi-objective list-based method for workflow scheduling. In: IEEE 4th international conference on cloud computing technology and science, pp. 185–192 (2012)

  48. Sakellariou, R., Zhao, H., Tsiakkouri, E., Dikaiakos, M.D.: Scheduling workflows with budget constraints. Integr. Res. GRID Comput. pp. 189–202 (2007)

  49. Pietri, I., Sakellariou, R.: Cost-efficient cpu provisioning for scientific workflows on clouds. In: International conference on grid economics and business models, pp. 49–64. Springer (2015)

  50. Zheng, W., Sakellariou, R.: Budget-deadline constrained workflow planning for admission control. J. Grid Comput. 11(4), 633–651 (2013)

    Article  Google Scholar 

  51. Amazon EC2 Pricing. https://aws.amazon.com/ec2/pricing/. Accessed 28 April 2019

Download references

Acknowledgement

This publication is an outcome of the R&D work undertaken project under the Visvesvaraya PhD Scheme of Ministry of Electronics and IT, Government of India, being implemented by Digital India Corporation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wakar Ahmad.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahmad, W., Alam, B., Ahuja, S. et al. A dynamic VM provisioning and de-provisioning based cost-efficient deadline-aware scheduling algorithm for Big Data workflow applications in a cloud environment. Cluster Comput 24, 249–278 (2021). https://doi.org/10.1007/s10586-020-03100-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-020-03100-7

Keywords

Navigation