Job scheduler for streaming applications in heterogeneous distributed processing systems

Al-Sinayyid, Ali; Zhu, Michelle

doi:10.1007/s11227-020-03223-z

Job scheduler for streaming applications in heterogeneous distributed processing systems

Published: 02 March 2020

Volume 76, pages 9609–9628, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Ali Al-Sinayyid¹ &
Michelle Zhu²

600 Accesses
15 Citations
Explore all metrics

Abstract

In this study, we investigated the problem of scheduling streaming applications on a heterogeneous cluster environment and, based on our previous work, developed the maximum throughput scheduler algorithm (MT-Scheduler) for streaming applications. The proposed algorithm uses a dynamic programming technique to efficiently map the application topology onto the heterogeneous distributed system based on computing and data transfer requirements, while also taking into account the capacity of the underlying cluster resources. The proposed approach maximizes the system throughput by identifying and minimizing the time incurred at the computing/transfer bottleneck. The MT-Scheduler supports scheduling applications structured as a directed acyclic graph. We conducted experiments using three Storm microbenchmark topologies in both simulation and real Apache Storm environments. In terms of the performance evaluation, we compared the proposed MT-Scheduler with the simulated round robin and the default Storm scheduler algorithms. The results indicated that the MT-Scheduler outperforms the default round robin approach in terms of both the average system latency and throughput.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improvised Distributed Data Streaming Scheduler in Storm

A3-Storm: topology-, traffic-, and resource-aware storm scheduler for heterogeneous clusters

Article 06 May 2020

A scheduling algorithm to maximize storm throughput in heterogeneous cluster

Article Open access 17 June 2023

References

Diasde Assunção M, da Silva Veith A, Buyya R (2018) Distributed data stream processing and edge computing: a survey on resource elasticity and future directions. J Netw Comput Appl 103:1–17
Article Google Scholar
Imai S, Patterson S, Varela CA (2017) Maximum sustainable throughput prediction for data stream processing over public clouds. In: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp 504–513
Khan S, Shakil KA, Alam M (2018) Cloud-based big data analytics—a survey of current research and future directions. In: Aggarwal VB, Bhatnagar V, Mishra DK (eds) Big data analytics, vol 654. Springer Singapore, Singapore, pp 595–604
Chapter Google Scholar
To Q-C, Soto J, Markl V (2018) A survey of state management in big data processing systems. VLDB J 27(6):847–872
Article Google Scholar
Teixeira FA, Pereira FMQ, Wong H-C, Nogueira JMS, Oliveira LB (2019) SIoT: securing internet of things through distributed systems analysis. Future Gener Comput Syst 92:1172–1186
Article Google Scholar
Caneill M, El Rheddane A, Leroy V, De Palma N (2016) Locality-aware routing in stateful streaming applications. In: Proceedings of the 17th International Middleware Conference on—Middleware ’16, Trento, Italy, pp 1–13
Yi S, Li C, Li Q (2015) A survey of fog computing: concepts, applications and issues. In: Proceedings of the 2015 Workshop on Mobile Big Data—Mobidata’15, Hangzhou, China, pp 37–4
Jansen G, Verbitskiy I, Renner T, Thamsen L (2018) Scheduling stream processing tasks on geo-distributed heterogeneous resources. In: 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, pp 5159–5164
Zhu M, Wu Q, Rao NSV, Iyengar S (2007) Optimal pipeline decomposition and adaptive network mapping to support distributed remote visualization. J Parallel Distrib Comput 67(8):947–956
Article Google Scholar
Wu Q, Zhu M, Gu Y, Rao NSV (2010) System design and algorithmic development for computational steering in distributed environments. IEEE Trans Parallel Distrib Syst 21(4):438–451
Article Google Scholar
Blum L, Shub M, Smale S (1988) On a theory of computation over the real numbers; NP-completeness, recursive functions and universal machines. In: Proceedings 1988 29th Annual Symposium on Foundations of Computer Science, pp 387–397
Xue J, Yang Z, Hou S, Dai Y (2015) When computing meets heterogeneous cluster: workload assignment in graph computation. In: 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA, pp 154–163
Aljoby WAY, Fu TZJ, Ma RTB (2017) Impacts of task placement and bandwidth allocation on stream analytics. In: 2017 IEEE 25th International Conference on Network Protocols (ICNP), Toronto, ON, pp 1–6
Kaur N, Sood SK (2017) Dynamic resource allocation for big data streams based on data characteristics (5Vs). Int J Netw Manag 27(4):e1978
Article Google Scholar
Mortazavi-Dehkordi M, Zamanifar K (2019) Efficient resource scheduling for the analysis of Big Data streams. Intell Data Anal 23(1):77–102
Article Google Scholar
Vasile M-A, Pop F, Tutueanu R-I, Cristea V, Kołodziej J (2015) Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing. Future Gener Comput Syst 51:61–71
Article Google Scholar
Qian Z et al. (2013) Timestream: reliable stream computation in the cloud. In: Proceedings of the 8th ACM European Conference on Computer Systems, pp 1–14
Akidau T et al (2013) MillWheel: fault-tolerant stream processing at internet scale. Proc VLDB Endow 6(11):1033–1044
Article Google Scholar
Neumeyer L, Robbins B, Nair A, Kesari A (2010) S4: distributed stream computing platform. In: 2010 IEEE International Conference on Data Mining Workshops, pp 170–177
Fu M et al (2017) Twitter Heron: towards extensible streaming engines. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp 1165–1172
Apache ZooKeeper. https://zookeeper.apache.org/. Accessed 10 Feb 2020
Amazon Timestream. Amazon Web Services, Inc. https://aws.amazon.com/timestream/. Accessed 10 Feb 2020
S4 Incubation Status—Apache Incubator. https://incubator.apache.org/projects/s4.html. Accessed 10 Feb 2020
Apache Storm. https://Storm.apache.org/. Accessed 10 Feb 2020
Peng B, Hosseini M, Hong Z, Farivar R, Campbell R (2015) R-Storm: resource-aware scheduling in storm. In: Proceedings of the 16th Annual Middleware Conference on—Middleware ’15, Vancouver, BC, Canada, pp 149–161
Xu J, Chen Z, Tang J, Su S (2014) T-Storm: traffic-aware [Online] scheduling in Storm. In: 2014 IEEE 34th International Conference on Distributed Computing Systems, pp 535–544
Li T, Tang J, Xu J (2015) A predictive scheduling framework for fast and distributed stream data processing. In: 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA, pp 333–338
Eskandari L, Mair J, Huang Z, Eyers D (2018) T3-Scheduler: a topology and traffic aware two-level Scheduler for stream processing systems in a heterogeneous cluster. Future Gener Comput Syst 89:617–632
Article Google Scholar
Aniello L, Baldoni R, Querzoni L (2013) Adaptive [Online] scheduling in Storm. In: Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems—DEBS ’13, Arlington, Texas, USA, p 207
Tantalaki N, Souravlas S, Roumeliotis M (2019) A review on big data real-time stream processing and its scheduling techniques. Int J Parallel Emerg Distrib Syst. https://doi.org/10.1080/17445760.2019.1585848
Article Google Scholar
Röger H, Mayer R (2019) A comprehensive survey on parallelization and elasticity in stream processing. arXiv:1901.09716 [cs.DC]
Sliwko L (2019) A taxonomy of schedulers—operating systems, clusters and big data frameworks. Glob J Comput Sci Technol 19:25–40
Article Google Scholar
Mahmud R, Kotagiri R, Buyya R (2018) Fog computing: a taxonomy, survey and future directions, pp 103–130. arXiv:1611.05539 [cs.DC]
Liu J, Pacitti E, Valduriez P (2018) A survey of scheduling frameworks in big data systems, p 28
Rychly M, Koda P, Mr P (2014) Scheduling decisions in stream processing on heterogeneous clusters. In: 2014 Eighth International Conference on Complex, Intelligent and Software Intensive Systems, Birmingham, UK, pp 614–619
Cardellini V, Lo Presti F, Nardelli M, Russo Russo G (2018) Optimal operator deployment and replication for elastic distributed data stream processing: optimal deployment and replication for elastic data stream processing. Concurr Comput Pract Exp 30(9):e4334
Article Google Scholar
Cardellini V, Grassi V, Lo Presti F, Nardelli M (2016) Optimal operator placement for distributed stream processing applications. In: Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems—DEBS ’16, Irvine, California, pp 69–80
Nardelli M, Cardellini V, Grassi V, Presti FL (2019) Efficient operator placement for distributed data stream processing applications. IEEE Trans Parallel Distrib Syst 30(8):1753–1767
Article Google Scholar
Nardelli M (2018) QoS-aware deployment and adaptation of data stream processing applications in geo-distributed environments. Ph.D. thesis, University of Rome Tor Vergata
Li C, Zhang J, Luo Y (2017) Real-time scheduling based on optimized topology and communication traffic in distributed real-time computation platform of Storm. J Netw Comput Appl 87:100–115
Article Google Scholar
Zhang W, Li S, Liu L, Jia Z, Zhang Y, Raychaudhuri D (2019) Hetero-edge: orchestration of real-time vision applications on heterogeneous edge clouds. In: IEEE INFOCOM 2019—IEEE Conference on Computer Communications, Paris, France, pp 1270–1278
Liu S, Weng J, Wang JH, An C, Zhou Y, Wang J (2019) An adaptive [online] scheme for scheduling and resource enforcement in storm. IEEE ACM Trans Netw 27:1373–1386
Article Google Scholar
Shukla A, Simmhan Y (2018) Model-driven scheduling for distributed stream processing systems. J Parallel Distrib Comput 117:98–114
Article Google Scholar
Kombi RK, Lumineau N, Lamarre P, Rivetti N, Busnel Y (2019) DABS-Storm: a data-aware approach for elastic stream processing. In: Hameurlain A, Wagner R, Morvan F, Tamine L (eds) Transactions on large-scale data- and knowledge-centered systems XL. vol 11360. Springer, Berlin, pp 58–93
Chapter Google Scholar
Liu X, Buyya R (2017) D-Storm: dynamic resource-efficient scheduling of stream processing applications. In: 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS), Shenzhen, pp 485–492
Apache Flink: Stateful Computations over Data Streams. https://flink.apache.org/. Accessed 10 Feb 2020
Apache SparkTM—Unified Analytics Engine for Big Data. https://spark.apache.org/. Accessed 10 Feb 2020
Al-Sinayyid A,Zhu M (2018) Maximizing the processing rate for streaming applications in Apache Storm. In: Proceedings of the 14th International Conference on Data Science (ICDATA’18)

Download references

Author information

Authors and Affiliations

Southern Illinois University Carbondale, Carbondale, IL, USA
Ali Al-Sinayyid
Montclair State University, Montclair, NJ, USA
Michelle Zhu

Authors

Ali Al-Sinayyid
View author publications
You can also search for this author in PubMed Google Scholar
Michelle Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Al-Sinayyid.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-Sinayyid, A., Zhu, M. Job scheduler for streaming applications in heterogeneous distributed processing systems. J Supercomput 76, 9609–9628 (2020). https://doi.org/10.1007/s11227-020-03223-z

Download citation

Published: 02 March 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11227-020-03223-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Job scheduler for streaming applications in heterogeneous distributed processing systems

Abstract

Access this article

Similar content being viewed by others

Improvised Distributed Data Streaming Scheduler in Storm

A3-Storm: topology-, traffic-, and resource-aware storm scheduler for heterogeneous clusters

A scheduling algorithm to maximize storm throughput in heterogeneous cluster

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Job scheduler for streaming applications in heterogeneous distributed processing systems

Abstract

Access this article

Similar content being viewed by others

Improvised Distributed Data Streaming Scheduler in Storm

A3-Storm: topology-, traffic-, and resource-aware storm scheduler for heterogeneous clusters

A scheduling algorithm to maximize storm throughput in heterogeneous cluster

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation