Abstract
Cloud computing is the fastest growing distributed computing paradigm that provides online IT resources on demand by following a pay-as-you-go billing model. The success of this computing paradigm enables cloud providers to offer an extensive collection of parallel computing resources to deal with Big Data workflow scheduling problems. Although, workflow scheduling has been extensively studied, however, most of them are unable to achieve user-specified deadline constraints at the cheap cost. In this paper, a Dynamic Cost-Efficient Deadline-Aware (DCEDA) heuristic algorithm is proposed for scheduling Big Data workflow that produces the cheapest schedule while achieving the deadline constraints. DCEDA dynamically takes appropriate scheduling decisions for workflow tasks based on the fact that deadline constraint is not violated in the future. Also, it continuously monitors the VM pool for identifying the active idle VMs that incur extra costs and overheads, and subsequently de-provision them. The experimental analysis based on Montage workflow and randomly generated synthetic workflow with various characteristics prove that DCEDA delivers better performance in comparison to the existing algorithms.
Similar content being viewed by others
References
Sagiroglu, S., Sinanc, D.: Big data: a review. In: IEEE international conference on collaboration technologies and systems (CTS), pp. 42–47 (2013)
Berman, F., Casanova, H., Chien, A., Cooper, K., Dail, H., Dasgupta, A., Deng, W., Dongarra, J., Johnsson, L., Kennedy, K., Koelbel, C.: New grid scheduling and rescheduling methods in the GrADS project. Int. J. Parallel Prog. 33(2/3), 209–229 (2005)
Deelman, E., Singh, G., Su, M.H., Blythe, J., Gil, Y., Kesselman, C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005)
Wieczorek, M., Prodan, R., Fahringer, T.: Scheduling of scientific workflows in the ASKALON grid environment. ACM SIG MOD Rec. 34(3), 56–62 (2005)
Hoffa, C., Mehta, G., Freeman, T., Deelman, E., Keahey, K., Berriman, B., Good, J.: On the use of cloud computing for scientific workflows. In: IEEE 4th international conference on eScience, pp. 640–645 (2008)
Juve, G., Deelman, E., Vahi, K., Mehta, G., Berriman, B., Berman, B. P., Maechling, P.: Scientific workflow applications on Amazon EC2.In: 5th IEEE international conference on E-Science workshops, pp. 59–66 (2009)
Ostermann, S., Iosup, A., Yigitbasi, N., Prodan, R., Fahringer, T., Epema, D.: A performance analysis of EC2 cloud computing services for scientific computing. In: 1st International conference on cloud computing. Springer, Berlin, pp. 115–131 (2010)
Deelman, E., Singh, G., Livny, M., Berriman, B., Good, J.: The cost of doing science on the cloud: the montage example. In: ACM/IEEE conference on supercomputing, pp. 1–12 (2008)
Amazon Web Services (AWS), https://aws.amazon.com. Accessed 28 April 2019
GoGrid. https://www.gogrid.com. Accessed 28 April 2019
Rackspace Cloud. https://www.rackspace.com. Accessed 28 April 2019
Kwok, Y.K., Ahmad, L.: Dynamic critical-path scheduling: an effective technique for allocating task graphs to multiprocessors. IEEE Trans. Parallel Distrib. Syst. 7(5), 506–521 (1996)
Chang, S.F., Messerschmitt, D.G.: Designing high-throughput VLC decoder. I. Concurrent VLSI architectures. IEEE Trans. Circuits Syst. Video Technol. 2(2), 187–196 (1992)
Xiong, N., Vasilakos, A.V., Yang, L.T., Song, L., Pan, Y., Kannan, R., Li, Y.: Comparative analysis of quality of service and memory usage for adaptive failure detectors in healthcare systems. IEEE J. Sel. Areas Commun. 27(4), 495–509 (2009)
Lin, B., Guo, W., Chen, G., Xiong, N., Li, R.: Cost-driven scheduling for deadline-constrained workflow on multi-clouds. In: IEEE parallel and distributed processing symposium workshop (IPDPSW), pp. 1191–1198 (2015)
Sousa, T., Silva, A., Neves, A.: Particle swarm based data mining algorithms for classification tasks. Parallel Comput. 30(5), 767–783 (2004)
Pan, Z., Zhang, Y., Kwong, S.: Efficient motion and disparity estimation optimization for low complexity multiview video coding. IEEE Trans. Broadcast. 61(2), 166–176 (2015)
Juve, G., Deelman, E., Vahi, K., Mehta, G., Berriman, B., Berman, B. P., Maechling, P.: Scientific workflow applications on Amazon EC2. In: 5th IEEE international conference on e-science workshops, pp. 59–66 (2009)
Juve, G., Deelman, E.: Scientific Workflows in the Cloud. Grids Clouds and Virtualization, pp. 71–91. Springer, London (2011)
Schad, J., Dittrich, J., Quiane-Ruiz, J.A.: Runtime measurements in the cloud: observing, analyzing, and reducing variance. Proc. VLDB Endoment 3(1–2), 460–471 (2010)
Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I.: Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Future Gener. Comput. Syst. 25(6), 599–616 (2009)
Sahni, J., Vidyarthi, D.P.: A cost-effective deadline-constrained dynamic scheduling algorithm for scientific workflows in a cloud environment. IEEE Trans. Cloud Comput. 6(1), 2–18 (2018)
Ahmad, W., Alam, B., Malik, S.: Performance analysis of list scheduling algorithms by random synthetic DAGs. In: 2nd International conference on advanced computing and software engineering (ICACSE), pp. 25–32 (2019)
Poola, D., Garg, S. K., Buyya, R., Yang, Y., Ramamohanarao, K.: Robust scheduling of scientific workflows with deadline and budget constraints in clouds. In: IEEE 28th international conference on advance information networking and applications, pp. 858–865 (2014)
Altmann, J., Kashef, M.M.: Cost model based service placement in federated hybrid clouds. Future Gener. Comput. Syst. 41, 79–90 (2014)
McGough, A.S., Forshaw, M., Gerrard, C., Wheater, S., Allen, B., Robinson, P.: Comparison of a cost-effective virtual cloud cluster with an existing campus cluster. Future Gener. Comput. Syst. 41, 65–78 (2014)
Wang, Q., Tan, M. M., Tang, X., Cai, W.: Minimizing cost in IaaS clouds via scheduled instance reservation. In: IEEE 37th international conference on distributed computing system (ICDCS), pp. 1565–1574 (2017)
Mao, M., Humphrey, M.: A performance study on the VM startup time in the cloud. In: IEEE 5th international conference on cloud computing. pp. 423–430 (2012)
Abrishami, S., Naghibzadeh, M., Epema, D.H.: Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds. Future Gener. Comput. Syst. 29(1), 158–169 (2013)
Arabnejad, V., Bubendorfer, K., Ng, B.: Scheduling deadline constrained scientific workflow scheduling on dynamically provisioned cloud resources. Future Gener. Comput. Syst. 75, 348–364 (2017)
Zheng, W., Qin, Y., Bugingo, E., Zhang, D., Chen, J.: Cost optimization for deadline-aware scheduling of big-data processing jobs on clouds. Future Gener. Comput. Syst. 82, 244–255 (2018)
Meena, J., Kumar, M., Vardhan, M.: Cost effective genetic algorithm for workflow scheduling in cloud under deadline constraint. IEEE Access 4, 5065–5082 (2016)
Verma, A., Kaushal, S.: Deadline constraint heuristic-based genetic algorithm for workflow scheduling in cloud. Int. J. Grid Util. Comput. 5(2), 96–106 (2014)
Rodriguez, M.A., Buyya, R.: Deadline based resource provisioning and scheduling algorithm for scientific workflows on clouds. IEEE Trans. Cloud Comput. 2(2), 222–235 (2014)
Haidri, R.A., Katti, C.P., Saxena, P.C.: Cost effective deadline aware scheduling strategy for workflow applications on virtual machines in cloud computing. J. King Saud Univ-Comput. Inf. Sci. (2017). https://doi.org/10.1016/j.jksuci.2017.10.009
Amazon elastic compute cloud (Amazon EC2). https://aws.amazon.com/ec2/. Accessed 28 April 2019
Amazon elastic block store (Amazon EBS). https://aws.amazon.com/ebs/. Accessed 28 April 2019
Cloud Sigma services, https://clouds.geant.org/cloud-sigma/. Accessed 28 April 2019
Google Cloud Platform. https://cloud.google.com/compute/. Accessed 28 April 2019
Saifullah, A., Ferry, D., Lu, C., Gill, C.: Real-time scheduling of parallel tasks under a general dag model. Report Number: WUCSE-2012–14 (2012)
Suter, F.: A synthetic task graph generator. https://github.com/frs69wq/daggen. Accessed 16 June 2019
Montage: an astronomical image mosaic engine. https://montage.ipac.caltech.edu. Accessed 5 Feb 2019
Topcuoglu, H., Hariri, S., Wu, M.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parall. Distrib. Syst. 13(3), 260–274 (2002)
Juve, G., Chervenak, A., Deelman, E., Bharathi, S., Mehta, G., Vahi, K.: Characterizing and profiling scientific workflows. Future Gener. Comput. Syst. 29(3), 682–692 (2013)
Bharathi, S., Chervenak, A., Deelman, E., Mehta, G., Su, M.H., Vahi, K.: Characterization of scientific workflows. In: IEEE 3rd workshop on workflows support large-scale science, pp. 1–10 (2008)
Workflow Generator. https://confluence.pegasus.isi.edu/display/pegasus/WorkflowGenerator. Accessed 28 May 2019
Durillo, J. J., Fard, H. M., Prodan, R.: MOHEFT: a multi-objective list-based method for workflow scheduling. In: IEEE 4th international conference on cloud computing technology and science, pp. 185–192 (2012)
Sakellariou, R., Zhao, H., Tsiakkouri, E., Dikaiakos, M.D.: Scheduling workflows with budget constraints. Integr. Res. GRID Comput. pp. 189–202 (2007)
Pietri, I., Sakellariou, R.: Cost-efficient cpu provisioning for scientific workflows on clouds. In: International conference on grid economics and business models, pp. 49–64. Springer (2015)
Zheng, W., Sakellariou, R.: Budget-deadline constrained workflow planning for admission control. J. Grid Comput. 11(4), 633–651 (2013)
Amazon EC2 Pricing. https://aws.amazon.com/ec2/pricing/. Accessed 28 April 2019
Acknowledgement
This publication is an outcome of the R&D work undertaken project under the Visvesvaraya PhD Scheme of Ministry of Electronics and IT, Government of India, being implemented by Digital India Corporation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ahmad, W., Alam, B., Ahuja, S. et al. A dynamic VM provisioning and de-provisioning based cost-efficient deadline-aware scheduling algorithm for Big Data workflow applications in a cloud environment. Cluster Comput 24, 249–278 (2021). https://doi.org/10.1007/s10586-020-03100-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-020-03100-7