Skip to main content
Log in

Reliability aware scheduling of bag of real time tasks in cloud environment

  • Published:
Computing Aims and scope Submit manuscript

Abstract

Cloud environment uses data center with a huge number of computational resources, and the probability of failing any of the resources increases with scale. Failures cause unavailability of services, which affects the reliability of the system. It is essential to consider the reliability issue for application deployment in the cloud, considering the failure of the resources. In this work, we address the reliability aware scheduling of tasks with hard deadlines in the cloud environment. We design, analyze and provide solutions for two special cases of the problem where (a) tasks have a common deadline on the machines with equal failure rate, and (b) tasks with equal execution time. For the general case of the problem, we propose two-phase heuristic approaches, one is the task ordering, and other is tasks mapping to machines. The performance of different task orderings and task mapping approaches is evaluated through simulation using synthetic and real traces. Based on the simulation result, the earliest due date ordering of tasks and mapping of the current task to the most reliable machine along with long task dropping performs better in general settings. We observe that task repetition and replication further improve the performance of the heuristics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Jammes F, Smit H (2005) Service-oriented paradigms in industrial automation. IEEE Trans Ind Inform 1(1):62–70

    Article  Google Scholar 

  2. Liu Q, Cai W, Shen J, Fu Z, Liu X, Linge N (2016) A speculative approach to spatial-temporal efficiency with multi-objective optimization in a heterogeneous cloud environment. Secur Commun Netw 9(17):4002–4012

    Article  Google Scholar 

  3. Ford D, Labelle F, Popovici FI, Stokely M, Truong V-A, Barroso L, Grimes C, Quinlan S (2010) Availability in globally distributed storage systems. In: Proceedings of the 9th USENIX conference on operating systems design and implementation, USENIX Association, pp 1–7

  4. Machida F, Kawato M, Maeno Y (2010) Redundant virtual machine placement for fault-tolerant consolidated server clusters. In: IEEE network operations and management symposium—NOMS 2010, pp 32–39

  5. Dai Y, Yang B, Dongarra J, Zhang G (2009) Cloud service reliability: modeling and analysis. In: IEEE Pacific Rim international symposium on dependable computing

  6. Vishwanath KV, Nagappan N (2010) Characterizing cloud computing hardware reliability. In: Proceedings of the 1st ACM symposium on Cloud computing (SoCC’10), pp 193–204

  7. Fu S, Xu C (2007) Exploring event correlation for failure prediction in coalitions of clusters. In: SC ’07: Proceedings of the 2007 ACM/IEEE conference on supercomputing, pp 1–12

  8. Summary of the Amazon S3 Service Disruption in the Northern Virginia (US-EAST-1) Region. https://aws.amazon.com/message/41926/. Accessed 5 Sept 2018

  9. Poola D, Garg SK, Buyya R, Yang Y, Ramamohanarao K (2014) Robust scheduling of scientific workflows with deadline and budget constraints in clouds. In: IEEE 28th international conference on advanced information networking and applications, pp 858–865

  10. Sahoo SK, Sivasubramaniam A, Squillante MS, Zhang Y (2004) Failure data analysis of a large-scale heterogeneous server environment. In: Proceedings of conference on dependable systems and networks

  11. Zhang Y, Squillante MS, Sivasubramaniam A, Sahoo RK (2004) Performance implications of failures in large-scale cluster scheduling. In: Proceedings of the 10th workshop on job scheduling strategies for parallel processing

  12. Sahoo RK, Oliner AJ, Rish I et al (2003) Critical event prediction for proactive management in large-scale computer clusters. In: Proceedings of ACM international conference on knowledge discovery and data mining

  13. Yang B, Xu X, Tan F, Park DH (2011) An utility-based job scheduling algorithm for cloud computing considering reliability factor. In: International conference on cloud and service computing, pp 95–102

  14. Beaumont O, Eyraud-Dubois L, Larchevêque H (2013) Reliable service allocation in clouds. In: IEEE 27th international symposium on parallel and distributed processing, pp 55–66

  15. Ferreira et al K (2011) Evaluating the Viability of Process Replication Reliability for Exascale Systems, International Conference for High Performance Computing, Networking, Storage and Analysis , pp. 1-12

  16. Xie G, Chen Y, Liu Y, Wei Y, Li R, Li K (2017) Resource consumption cost minimization of reliable parallel applications on heterogeneous embedded systems. IEEE Trans Ind Inform 13(4):1629–1640

    Article  Google Scholar 

  17. Zhao B, Aydin H, Zhu D (2010) On maximizing reliability of real-time embedded applications under hard energy constraint. IEEE Trans Ind Inform 6(3):316–328

    Article  Google Scholar 

  18. Alam ABMB, Zulkernine M, Haque A (2017) A reliability-based resource allocation approach for cloud computing. In: IEEE 7th international symposium on cloud and service computing (SC2), pp 249–252

  19. Qiu X, Dai Y, Xiang Y, Xing L (2016) A hierarchical correlation model for evaluating reliability, performance, and power consumption of a cloud service. IEEE Trans Syst Man Cybern Syst 46(3):401–412

    Article  Google Scholar 

  20. Shatz SM, Wang JP (1989) Models and algorithms for reliability-oriented task-allocation in redundant distributed computer systems. IEEE Trans Reliab 38(1):16–27

    Article  Google Scholar 

  21. Brucker P (2001) Scheduling algorithms, 3rd edn. Springer, Berlin

    Book  Google Scholar 

  22. Buttazzo GC, Bertogna M, Yao G (2013) Limited preemptive scheduling for real-time systems. A survey. IEEE Trans Ind Inform 9(1):3–15

    Article  Google Scholar 

  23. Lawler EL (1983) Scheduling a single machine to minimize the number of late jobs. EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/1983/6344.html. Accessed 10 Nov 2018

  24. Baptiste P (2000) Preemptive scheduling of identical machines, Report 2000-314

  25. Brucker P (1981) Minimizing maximum lateness in a two-machine unit-time job shop. Computing 27:367. https://doi.org/10.1007/BF02277185

    Article  MATH  Google Scholar 

  26. Martello S, Toth P (2006) Knapsack problems. Wiley, London

    MATH  Google Scholar 

  27. Martello S, Pisinger D, Toth P (1999) Dynamic programming and strong bounds for the 0–1 knapsack problem. Manag Sci 45:414–424

    Article  Google Scholar 

  28. Brucker P, Kravchenko SA (1999) Preemption can make parallel machine scheduling problems hard. OSM Reihe P, Heft 211, Universit at Osnabruck, Fachbereich Mathematik/Informatik

  29. J. Wilkes—More Google cluster data. http://googleresearch.blogspot.ch/2011/11/more-google-clusterdata.html. Accessed 7 July 2018

  30. Chen Y, Alspaugh S, Katz R (2012) Interactive analytical processing in big data systems: a cross-industry study of mapreduce workloads. In: Proceedings of 38th international conference on very large databases

  31. Chen Y, Ganapathi A, Griffith R, Katz R (2011) The case for evaluating mapreduce performance using workload suites. In: Proceedings of IEEE/ACM international symposium on modeling, analysis and simulation of computer and telecommunication systems

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chinmaya Kumar Swain.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Swain, C.K., Saini, N. & Sahu, A. Reliability aware scheduling of bag of real time tasks in cloud environment. Computing 102, 451–475 (2020). https://doi.org/10.1007/s00607-019-00749-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-019-00749-w

Keywords

Mathematics Subject Classification

Navigation