Skip to main content
Log in

Modeling and analysis of distributed schedulers in data center cluster networks

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

One of the goals of cloud service providers is to satisfy service-level agreements without significant over-provisioning in data center clusters. Efforts to meet these requirements have been mainly based on resource over-provisioning rather than identifying performance bottlenecks. While increasing parallelism tends to reduce the average and tail latency, the joint impact of concurrent job scheduling and parallel task processing is a challenging problem to analytically model, particularly when compared to the models developed without the notion of concurrency. This article presents an analytical model for distributed schedulers in data center cluster networks. The model can be used to investigate how latency can affect a data center network design and how many resources should be allocated to meet service-level agreements. To get better insight, we build upon ideas from queuing networks, which provide a framework to measure expected latency versus resource provisioning. The model is based on tandem queuing networks and fork–join systems to compute expected latency in closed forms at various stages of data center cluster networks. Theoretical analysis and simulations have been conducted to demonstrate the effectiveness of the proposed model and to strike a balance between expected latency and resource utilization. Results obtained from various simulation scenarios on different data center traffic traces confirm the soundness of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. \(Pr[X_{1}, X_{2}, \ldots X_{n}] = c \displaystyle \prod \nolimits _{i=1}^{n} Pr[X_{i}]\), where c is a constant.

  2. Queue length distribution is insensitive to the service time distribution.

References

  1. Alibaba.com: Alibaba production cluster data (2018). https://github.com/alibaba/clusterdata

  2. Ananthanarayanan, G., Ghodsi, A., Shenker, S., Stoica, I.: Effective straggler mitigation: Attack of the clones. In: Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2013, Lombard, IL, USA, 2–5 April 2013, pp. 185–198. USENIX Association (2013)

  3. Bai, W.H., Xi, J.Q., Zhu, J.X., Huang, S.W.: Performance analysis of heterogeneous data centers in cloud computing using a complex queuing model. Math. Probl. Eng. 2015, 1–15 (2015)

    MathSciNet  MATH  Google Scholar 

  4. Chkirbene, Z., Hadjidj, R., Foufou, S., Hamila, R.: Lascada: a novel scalable topology for data center network. IEEE/ACM Trans. Netw. 28(5), 2051–2064 (2020)

    Article  Google Scholar 

  5. Dukic, V., Khanna, G., Gkantsidis, C., Karagiannis, T., Parmigiani, F., Singla, A., Filer, M., Cox, J.L., Ptasznik, A., Harland, N., Saunders, W., Belady, C.: Beyond the mega-data center: networking multi-data center regions. In: SIGCOMM, pp. 765–781. ACM (2020)

  6. El Kafhali, S., Salah, K.: Stochastic modelling and analysis of cloud computing data center. In: Innovations in Clouds, Internet and Networks (ICIN), pp. 122–126. IEEE (2017)

  7. Garcia-Carballeira, F., Calderón, A., Carretero, J.: Enhancing the power of two choices load balancing algorithm using round robin policy. Clust. Comput. 24, 611–624 (2020)

    Article  Google Scholar 

  8. Graham, C., Buest, R., Ackerman, D., Nag, S.: Forecast analysis: cloud managed services, worldwide (2020). https://www.gartner.com/en/documents/3981360

  9. Guo, L., Yan, T., Zhao, S., Jiang, C.: Dynamic performance optimization for cloud computing using M/M/m queueing system. J. Appl. Math. 2014, 756592:1–756592:8 (2014)

    Google Scholar 

  10. Jackson, J.R.: Networks of waiting lines. Oper. Res. 5(4), 518–521 (1957)

    Article  MathSciNet  Google Scholar 

  11. Jafarnejad Ghomi, E., Rahmani, A.M., Qader, N.N.: Applying queue theory for modeling of cloud computing: a systematic review. Concurr. Comput. Pract. Exp. 31(2), e5186 (2019)

    Article  Google Scholar 

  12. Khazaei, H., Misic, J.V., Misic, V.B.: Performance analysis of cloud computing centers using M/G/m/m + r queuing systems. IEEE Trans. Parallel Distrib. Syst. 23(5), 936–943 (2012)

    Article  Google Scholar 

  13. Kleinrock, L.: Communication Nets. Stochastic Message Flow and Delay. McGraw-Hill Book Company, New York (1964)

    MATH  Google Scholar 

  14. Kleinrock, L.: Queueing Systems: Theory, vol. I. Wiley Interscience, New York (1975)

    MATH  Google Scholar 

  15. Kumar, G., Dukkipati, N., Jang, K., Wassel, H.M.G., Wu, X., Montazeri, B., Wang, Y., Springborn, K., Alfeld, C., Ryan, M., Wetherall, D., Vahdat, A.: Swift: delay is simple and effective for congestion control in the datacenter. In: SIGCOMM, pp. 514–528. ACM (2020)

  16. Mohtavipour, S.M., Mollajafari, M., Naseri, A.: A novel packet exchanging strategy for preventing HoL-blocking in fat-trees. Clust. Comput. 23, 461–482 (2020)

    Article  Google Scholar 

  17. Ousterhout, A., Perry, J., Balakrishnan, H., Lapukhov, P.: Flexplane: an experimentation platform for resource management in datacenters. In: 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2017, Boston, MA, USA, 27–29 March 2017, pp. 438–451. USENIX Association (2017)

  18. Ousterhout, K., Canel, C., Ratnasamy, S., Shenker, S.: Monotasks: architecting for performance clarity in data analytics frameworks. In: SOSP, pp. 184–200. ACM (2017)

  19. Poola, D., Ramamohanarao, K., Buyya, R.: Enhancing reliability of workflow execution using task replication and spot instances. ACM Trans. Auton. Adapt. Syst. 10(4), 1–21 (2016)

    Article  Google Scholar 

  20. Qiu, Z., Pérez, J.F., Harrison, P.G.: Beyond the mean in fork-join queues: efficient approximation for response-time tails. Perform. Eval. 91, 99–116 (2015)

    Article  Google Scholar 

  21. Reiss, C., Wilkes, J., Hellerstein, J.L.: Google Cluster-Usage Traces: Format+ Schema, White Paper, pp. 1–14. Google, Inc. (2011)

  22. Schwarzkopf, M., Bailis, P.: Research for practice: cluster scheduling for datacenters. Commun. ACM 61(5), 50–53 (2018)

    Article  Google Scholar 

  23. Sridharan, R., Domnic, S.: Network policy aware placement of tasks for elastic applications in IaaS-cloud environment. Clust. Comput. 24, 1381–1396 (2021)

    Article  Google Scholar 

  24. Thomasian, A.: Analysis of fork/join and related queueing systems. ACM Comput. Surv. (CSUR) 47(2), 17 (2015)

    Article  Google Scholar 

  25. Vilaplana, J., Solsona, F., Teixidó, I., Mateo, J., Abella, F., Rius, J.: A queuing theory model for cloud computing. J. Supercomput. 69(1), 492–507 (2014)

    Article  Google Scholar 

  26. Wang, W., Harchol-Balter, M., Jiang, H., Scheller-Wolf, A., Srikant, R.: Delay asymptotics and bounds for multi-task parallel jobs. SIGMETRICS Perform. Eval. Rev. 46(3), 2–7 (2018)

    Article  Google Scholar 

  27. Yang, B., Tan, F., Dai, Y.S.: Performance evaluation of cloud service considering fault recovery. J. Supercomput. 65(1), 426–444 (2013)

    Article  Google Scholar 

  28. Zhang, T., Huang, J., Chen, K., Wang, J., Chen, J., Pan, Y., Min, G.: Rethinking fast and friendly transport in data center networks. IEEE/ACM Trans. Netw. 28(5), 2364–2377 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hassan Peyravi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Order statistics

Let \(X \overset{iid}{=} X_{1}, X_{2}, \ldots , X_{n}\) are mutually independent and identically distributed (iid) random variable with \(X_{(k)}\) be the kth smallest X (called the kth order statistic), then,

$$ \begin{array}{c} X_{(1)} = \min \left\{ X_{1}, X_{2}, \ldots , X_{n} \right\} , \\ X_{(n)} = \max \left\{ X_{1}, X_{2}, \ldots , X_{n} \right\} , \\ X_{(1)} \le X_{(2)} \le \cdots \le X_{(n-1)} \le X_{(n)}. \end{array} $$
(29)

The expected values of the maximum and the minimum of these n random variables can be found if the cumulative distribution function (cdf) of \(X_{(n)}\) and \(X_{(1)}\) are calculated.

1.1 Density of and cumulative functions the maximum

$$\begin{aligned} F_{max}(x)= & {} F_{X_{(n)}}(x) = P \left( X_{(n)} \le x\right) \\= & {} P \left( X_{(1)} \le x, X_{(2)} \le x, \ldots , X_{(n)} \le x \right) \\ = P \left( X_{1} \le x, X_{2} \le x, \ldots , X_{n} \le x \right) = F_{1}(x) F_{2}(x) \cdots F_{n}(x) = F^{n}(x) \end{aligned}$$
(30)
$$\begin{aligned} f_{max}(x)= & {} \frac{d}{dx} F^{n}(x) = n f(x) F^{n-1}(x). \end{aligned}$$
(31)

1.2 Density and cumulative functions of the minimum

$$\begin{aligned} F_{min}(x)= & {} F_{X_{(1)}}(x) = 1- P \left( X_{min}> x\right) = 1-P \left( X_{1}> x, X_{2}> x, \ldots , X_{n} > x \right) \\= & {} 1-\left( 1-F_{1}(x)\right) \left( 1-F_{2}(x)\right) \cdots \left( 1-F_{n}(x)\right) = 1-(1-F(x))^{n}, \end{aligned}$$
(32)
$$\begin{aligned} f_{min}(x)= & {} - \frac{d}{dx} (1-F(x))^{n} = n f(x) (1-F(x))^{n-1}. \end{aligned}$$
(33)

1.3 Density of maximum and minimum for exponential distribution

Let \(X_{1}, X_{2}, \ldots , X_{n} \overset{iid}{\sim } Exp(\mu )\), then

$$ f_{min}(x) = n f(x) (1-F(x))^{n-1} = n \left( \mu e^{-\mu x}\right) \left[ (1- \left( 1-e^{-\mu x}\right) \right] ^{n-1} = n \mu e^{-n \mu x} $$
(34)

and

$$ f_{max}(x) = n f(x) (F(x))^{n-1} = n \left( \mu e^{-\mu x}\right) \left[ 1-e^{-\mu x} \right] ^{n-1}. $$
(35)

1.4 Expected values of the maximum and the minimum for exponential distribution

The mean of the maximum of n independent random variables is

$$\begin{aligned} E\left[ X_{(n)}\right]= & {} n \int _{-\infty }^{\infty} x f(x) F^{n-1} (x) dx, \end{aligned}$$
(36)
$$\begin{aligned} E\left[ X_{(1)}\right]= & {} n \int _{-\infty }^{\infty} x f(x) \left( 1 -F(x) \right) ^{n-1} dx. \end{aligned}$$
(37)

If \(X_{i} \overset{iid}{\sim } Exp (\mu )\) for \(i=1, 2, \ldots \), then

$$\begin{aligned} E\left[ X_{(n)}\right]= & {} n \int _{0}^{\infty} x f(x) F^{n-1} (x) dx \\= & {} n \int _{0}^{\infty} x \mu e^{-\mu x} \left( 1- e^{-\mu x} \right) ^{n-1} dx \\= & {} n \mu \int _{0}^{\infty} x e^{-\mu x} \left( 1- e^{-\mu x} \right) ^{n-1} dx = n \mu \int _{0}^{\infty} x e^{-\mu x} \sum _{k=0}^{n-1} \left( {\begin{array}{c}n-1\\ k\end{array}}\right) \left( -e^{-\mu x}\right) ^{k} dx \\= & {} n \mu \sum _{k=0}^{n-1} \left( {\begin{array}{c}n-1\\ k\end{array}}\right) (-1)^{k} \int _{0}^{\infty} x e^{-(k+1)\mu x} dx = n \mu \sum _{k=0}^{n-1} \left( {\begin{array}{c}n-1\\ k\end{array}}\right) (-1)^{k} \frac{1}{((k+1)\mu )^{2}} \\= & {} \frac{n}{\mu } \sum _{k=0}^{n-1} \left( {\begin{array}{c}n-1\\ k\end{array}}\right) (-1)^{k} \frac{1}{(k+1)^{2}} = J(n) \\= & {} \frac{1}{\mu } \sum _{k=1}^{n} \left( {\begin{array}{c}n\\ k\end{array}}\right) (-1)^{k} \frac{1}{k} = \frac{1}{\mu } \left( 1+\frac{1}{2}+\frac{1}{3}+ \cdots + \frac{1}{n}\right) = H_{n}/\mu , \end{aligned}$$
(38)

where \(\int xe^{ax} = \left( \frac{x}{a} - \frac{1}{a^{2}} \right) e^{ax}\), and \((a+b)^{n} = \sum _{k=0}^{n} \left( {\begin{array}{c}n\\ k\end{array}}\right) a^{n-k} b^{k}\), and \(\left( {\begin{array}{c}n+1\\ k\end{array}}\right) = \left( {\begin{array}{c}n\\ k\end{array}}\right) + \left( {\begin{array}{c}n\\ k-1\end{array}}\right) \).

$$\begin{aligned} E\left[ X_{(1)}\right]= & {} n \int _{-\infty }^{\infty} x f(x) \left( 1 -F(x) \right) ^{n-1} dx \\= & {} n \int _{0}^{\infty} x \mu e^{-\mu x} \left( e^{-\mu x}\right) ^{n-1} dx \\= & {} n \int _{0}^{\infty} x \mu e^{-n \mu x} dx \\= & {} \frac{1}{n\mu }. \end{aligned}$$
(39)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alshahrani, R., Peyravi, H. Modeling and analysis of distributed schedulers in data center cluster networks. Cluster Comput 24, 3351–3366 (2021). https://doi.org/10.1007/s10586-021-03343-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-021-03343-y

Keywords

Navigation