Skip to main content
Log in

Job scheduler for streaming applications in heterogeneous distributed processing systems

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In this study, we investigated the problem of scheduling streaming applications on a heterogeneous cluster environment and, based on our previous work, developed the maximum throughput scheduler algorithm (MT-Scheduler) for streaming applications. The proposed algorithm uses a dynamic programming technique to efficiently map the application topology onto the heterogeneous distributed system based on computing and data transfer requirements, while also taking into account the capacity of the underlying cluster resources. The proposed approach maximizes the system throughput by identifying and minimizing the time incurred at the computing/transfer bottleneck. The MT-Scheduler supports scheduling applications structured as a directed acyclic graph. We conducted experiments using three Storm microbenchmark topologies in both simulation and real Apache Storm environments. In terms of the performance evaluation, we compared the proposed MT-Scheduler with the simulated round robin and the default Storm scheduler algorithms. The results indicated that the MT-Scheduler outperforms the default round robin approach in terms of both the average system latency and throughput.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Diasde Assunção M, da Silva Veith A, Buyya R (2018) Distributed data stream processing and edge computing: a survey on resource elasticity and future directions. J Netw Comput Appl 103:1–17

    Article  Google Scholar 

  2. Imai S, Patterson S, Varela CA (2017) Maximum sustainable throughput prediction for data stream processing over public clouds. In: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp 504–513

  3. Khan S, Shakil KA, Alam M (2018) Cloud-based big data analytics—a survey of current research and future directions. In: Aggarwal VB, Bhatnagar V, Mishra DK (eds) Big data analytics, vol 654. Springer Singapore, Singapore, pp 595–604

    Chapter  Google Scholar 

  4. To Q-C, Soto J, Markl V (2018) A survey of state management in big data processing systems. VLDB J 27(6):847–872

    Article  Google Scholar 

  5. Teixeira FA, Pereira FMQ, Wong H-C, Nogueira JMS, Oliveira LB (2019) SIoT: securing internet of things through distributed systems analysis. Future Gener Comput Syst 92:1172–1186

    Article  Google Scholar 

  6. Caneill M, El Rheddane A, Leroy V, De Palma N (2016) Locality-aware routing in stateful streaming applications. In: Proceedings of the 17th International Middleware Conference on—Middleware ’16, Trento, Italy, pp 1–13

  7. Yi S, Li C, Li Q (2015) A survey of fog computing: concepts, applications and issues. In: Proceedings of the 2015 Workshop on Mobile Big Data—Mobidata’15, Hangzhou, China, pp 37–4

  8. Jansen G, Verbitskiy I, Renner T, Thamsen L (2018) Scheduling stream processing tasks on geo-distributed heterogeneous resources. In: 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, pp 5159–5164

  9. Zhu M, Wu Q, Rao NSV, Iyengar S (2007) Optimal pipeline decomposition and adaptive network mapping to support distributed remote visualization. J Parallel Distrib Comput 67(8):947–956

    Article  Google Scholar 

  10. Wu Q, Zhu M, Gu Y, Rao NSV (2010) System design and algorithmic development for computational steering in distributed environments. IEEE Trans Parallel Distrib Syst 21(4):438–451

    Article  Google Scholar 

  11. Blum L, Shub M, Smale S (1988) On a theory of computation over the real numbers; NP-completeness, recursive functions and universal machines. In: Proceedings 1988 29th Annual Symposium on Foundations of Computer Science, pp 387–397

  12. Xue J, Yang Z, Hou S, Dai Y (2015) When computing meets heterogeneous cluster: workload assignment in graph computation. In: 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA, pp 154–163

  13. Aljoby WAY, Fu TZJ, Ma RTB (2017) Impacts of task placement and bandwidth allocation on stream analytics. In: 2017 IEEE 25th International Conference on Network Protocols (ICNP), Toronto, ON, pp 1–6

  14. Kaur N, Sood SK (2017) Dynamic resource allocation for big data streams based on data characteristics (5Vs). Int J Netw Manag 27(4):e1978

    Article  Google Scholar 

  15. Mortazavi-Dehkordi M, Zamanifar K (2019) Efficient resource scheduling for the analysis of Big Data streams. Intell Data Anal 23(1):77–102

    Article  Google Scholar 

  16. Vasile M-A, Pop F, Tutueanu R-I, Cristea V, Kołodziej J (2015) Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing. Future Gener Comput Syst 51:61–71

    Article  Google Scholar 

  17. Qian Z et al. (2013) Timestream: reliable stream computation in the cloud. In: Proceedings of the 8th ACM European Conference on Computer Systems, pp 1–14

  18. Akidau T et al (2013) MillWheel: fault-tolerant stream processing at internet scale. Proc VLDB Endow 6(11):1033–1044

    Article  Google Scholar 

  19. Neumeyer L, Robbins B, Nair A, Kesari A (2010) S4: distributed stream computing platform. In: 2010 IEEE International Conference on Data Mining Workshops, pp 170–177

  20. Fu M et al (2017) Twitter Heron: towards extensible streaming engines. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp 1165–1172

  21. Apache ZooKeeper. https://zookeeper.apache.org/. Accessed 10 Feb 2020

  22. Amazon Timestream. Amazon Web Services, Inc. https://aws.amazon.com/timestream/. Accessed 10 Feb 2020

  23. S4 Incubation Status—Apache Incubator. https://incubator.apache.org/projects/s4.html. Accessed 10 Feb 2020

  24. Apache Storm. https://Storm.apache.org/. Accessed 10 Feb 2020

  25. Peng B, Hosseini M, Hong Z, Farivar R, Campbell R (2015) R-Storm: resource-aware scheduling in storm. In: Proceedings of the 16th Annual Middleware Conference on—Middleware ’15, Vancouver, BC, Canada, pp 149–161

  26. Xu J, Chen Z, Tang J, Su S (2014) T-Storm: traffic-aware [Online] scheduling in Storm. In: 2014 IEEE 34th International Conference on Distributed Computing Systems, pp 535–544

  27. Li T, Tang J, Xu J (2015) A predictive scheduling framework for fast and distributed stream data processing. In: 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA, pp 333–338

  28. Eskandari L, Mair J, Huang Z, Eyers D (2018) T3-Scheduler: a topology and traffic aware two-level Scheduler for stream processing systems in a heterogeneous cluster. Future Gener Comput Syst 89:617–632

    Article  Google Scholar 

  29. Aniello L, Baldoni R, Querzoni L (2013) Adaptive [Online] scheduling in Storm. In: Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems—DEBS ’13, Arlington, Texas, USA, p 207

  30. Tantalaki N, Souravlas S, Roumeliotis M (2019) A review on big data real-time stream processing and its scheduling techniques. Int J Parallel Emerg Distrib Syst. https://doi.org/10.1080/17445760.2019.1585848

    Article  Google Scholar 

  31. Röger H, Mayer R (2019) A comprehensive survey on parallelization and elasticity in stream processing. arXiv:1901.09716 [cs.DC]

  32. Sliwko L (2019) A taxonomy of schedulers—operating systems, clusters and big data frameworks. Glob J Comput Sci Technol 19:25–40

    Article  Google Scholar 

  33. Mahmud R, Kotagiri R, Buyya R (2018) Fog computing: a taxonomy, survey and future directions, pp 103–130. arXiv:1611.05539 [cs.DC]

  34. Liu J, Pacitti E, Valduriez P (2018) A survey of scheduling frameworks in big data systems, p 28

  35. Rychly M, Koda P, Mr P (2014) Scheduling decisions in stream processing on heterogeneous clusters. In: 2014 Eighth International Conference on Complex, Intelligent and Software Intensive Systems, Birmingham, UK, pp 614–619

  36. Cardellini V, Lo Presti F, Nardelli M, Russo Russo G (2018) Optimal operator deployment and replication for elastic distributed data stream processing: optimal deployment and replication for elastic data stream processing. Concurr Comput Pract Exp 30(9):e4334

    Article  Google Scholar 

  37. Cardellini V, Grassi V, Lo Presti F, Nardelli M (2016) Optimal operator placement for distributed stream processing applications. In: Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems—DEBS ’16, Irvine, California, pp 69–80

  38. Nardelli M, Cardellini V, Grassi V, Presti FL (2019) Efficient operator placement for distributed data stream processing applications. IEEE Trans Parallel Distrib Syst 30(8):1753–1767

    Article  Google Scholar 

  39. Nardelli M (2018) QoS-aware deployment and adaptation of data stream processing applications in geo-distributed environments. Ph.D. thesis, University of Rome Tor Vergata

  40. Li C, Zhang J, Luo Y (2017) Real-time scheduling based on optimized topology and communication traffic in distributed real-time computation platform of Storm. J Netw Comput Appl 87:100–115

    Article  Google Scholar 

  41. Zhang W, Li S, Liu L, Jia Z, Zhang Y, Raychaudhuri D (2019) Hetero-edge: orchestration of real-time vision applications on heterogeneous edge clouds. In: IEEE INFOCOM 2019—IEEE Conference on Computer Communications, Paris, France, pp 1270–1278

  42. Liu S, Weng J, Wang JH, An C, Zhou Y, Wang J (2019) An adaptive [online] scheme for scheduling and resource enforcement in storm. IEEE ACM Trans Netw 27:1373–1386

    Article  Google Scholar 

  43. Shukla A, Simmhan Y (2018) Model-driven scheduling for distributed stream processing systems. J Parallel Distrib Comput 117:98–114

    Article  Google Scholar 

  44. Kombi RK, Lumineau N, Lamarre P, Rivetti N, Busnel Y (2019) DABS-Storm: a data-aware approach for elastic stream processing. In: Hameurlain A, Wagner R, Morvan F, Tamine L (eds) Transactions on large-scale data- and knowledge-centered systems XL. vol 11360. Springer, Berlin, pp 58–93

    Chapter  Google Scholar 

  45. Liu X, Buyya R (2017) D-Storm: dynamic resource-efficient scheduling of stream processing applications. In: 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS), Shenzhen, pp 485–492

  46. Apache Flink: Stateful Computations over Data Streams. https://flink.apache.org/. Accessed 10 Feb 2020

  47. Apache SparkTM—Unified Analytics Engine for Big Data. https://spark.apache.org/. Accessed 10 Feb 2020

  48. Al-Sinayyid A,Zhu M (2018) Maximizing the processing rate for streaming applications in Apache Storm. In: Proceedings of the 14th International Conference on Data Science (ICDATA’18)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Al-Sinayyid.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Sinayyid, A., Zhu, M. Job scheduler for streaming applications in heterogeneous distributed processing systems. J Supercomput 76, 9609–9628 (2020). https://doi.org/10.1007/s11227-020-03223-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03223-z

Keywords

Navigation