RICH: Strategy-proof and efficient coflow scheduling in non-cooperative environments

https://doi.org/10.1016/j.jnca.2021.103233Get rights and content

Abstract

Coflow scheduling can effectively improve the application performance and has been studied a lot in cooperative environments (e.g., private datacenter networks), where fairness is not the primary concern. In non-cooperative environments (e.g., multi-tenant datacenter networks), coflow scheduling should be strategy-proof; otherwise, some tenants could unfairly acquire more resources by cheating the scheduler. As minimizing coflow completion time (CCT) must prioritize coflows based on some specific rules (e.g., shortest-coflow-first, smallest-effective-bottleneck-first), tenants can raise the priority of their coflows by lying about the coflow information. Thus, it is a common belief that optimizing coflow performance can inevitably violate strategy-proofness.

In this paper, we argue that the average CCT can be reduced without violating strategy-proofness. Our key insight is that prioritization can inherently achieve better CCT even without those specific rules such as smallest-effective-bottleneck-first. We propose RICH, a coflow scheduler in non-cooperative environments. At its heart, RICH splits the time into multiple rounds. In each round, RICH ensures that the total data transmitted by each tenant can provide optimal isolation guarantee. Among different rounds, RICH prioritizes coflow transmission among tenants in a round-robin manner. In this way, all tenants are fairly prioritized, and tenants do not necessarily gain more bandwidth by cheating. Extensive simulations show that RICH outperforms other strategy-proof mechanisms by up to 39.3% in terms of average CCT.

Introduction

Data-parallel applications such as MapReduce (Dean and Ghemawat, 2008) and Spark (Zaharia et al., 2010) that are widely applied to cloud computing are very common in the datacenters. In these applications, a job usually contains communications between various pairs of servers, generating several parallel end-to-end flows. Generally, the communication does not finish until all flows have finished their transmissions. Therefore, optimizing the performance of individual flows does not necessarily improve application performance. Thus, the coflow abstraction (Chowdhury, 2015a) has been proposed to narrow down the mismatch. Coflow refers to a collection of parallel flows with the same objective. A coflow does not complete until all its constituent flows have finished. Minimizing the coflow completion time (CCT) can truly accelerate the corresponding job. To this end, tremendous efforts have been made to optimize coflow performance (Chowdhury et al., 2014, Chowdhury and Stoica, 2015, Zhang et al., 2016, Wang et al., 2017, Wang et al., 2018a).

Most of these works focus on minimizing the average CCT in cooperative environments like private datacenter networks. In these environments, it is feasible to let different jobs collaborate to improve the overall application performance. Specifically, coflow scheduling algorithms prioritize some coflows against others to minimize the average CCT. For example, Varys (Chowdhury et al., 2014) preferentially allocates the bandwidth to the coflows with the smaller bottleneck’s completion time.

However, the same mechanisms cannot directly be employed in non-cooperative environments like multi-tenant datacenter networks. In these environments, various tenants from different entities share the same resources. A tenant is not willing to yield resources to other tenants. In such environments, prioritizing coflows among different tenants according to a specific rule like smallest-coflow-first can encourage tenants to lie about their demands. Thus, a scheduling algorithm should be strategy-proof — a tenant should not be able to obtain more bandwidth by lying (Chowdhury et al., 2016). In fact, it has been long believed that strategy-proofness and minimizing CCT are conflicting objectives (Chowdhury et al., 2016).

Recent works on coflow scheduling in non-cooperative environments either give up prioritization to ensure strategy-proofness (Chowdhury et al., 2016, Ghodsi et al., 2011) or trade off strategy-proofness for performance (Wang et al., 2017). DRF (Ghodsi et al., 2011) and HUG (Chowdhury et al., 2016) equally split the bandwidth among the competing coflows in a max–min fair manner. However, without prioritization, they cannot achieve satisfactory performance in CCT. For example, HUG sustains 1.45× longer average shuffle completion time of MapReduce against Varys (Chowdhury et al., 2016). On the other hand, a series of mechanisms (Wang et al., 2017, Wang et al., 2018a, Wang and Jin, 2016) show that CCT can be significantly reduced by trading off the strategy-proofness. For example, Coflex (Wang et al., 2017) uses a tunable fairness knob to flexibly offer a tradeoff between fairness and efficiency. Utopia (Wang et al., 2018a) can achieve near-optimal CCT performance with provable isolation guarantee. However, even though these mechanisms are designed with long-term fairness in mind, they can suffer from serious unfairness in some specific situations (Section 2).

In this paper, we argue that the CCT can be further reduced without violating strategy-proofness. This is based on two insights. First, prioritization does not necessarily result in the violation of strategy-proofness. The violation of strategy-proofness only happens when the rules of prioritization are merely determined by the properties of the coflows (e.g., size, length, width) and can be successfully speculated by the tenants. As long as the tenants are not able to discover the rules of prioritization, they cannot obtain more bandwidth by cheating. Second, regardless of the scheduling order, prioritizing coflows inherently achieves lower CCT than the max–min division of bandwidth. In other words, even prioritization without some specific rules such as smallest-coflow-first, the CCT can still be reduced compared with DRF and HUG.

We propose RICH, a Round-robIn Coflow scHeduler. RICH splits the time into multiple rounds. Similar to HUG, RICH ensures that the overall bandwidth allocated to each tenant in each round can provide the optimal isolation guarantee. Different from HUG, RICH prioritizes coflows inside each round to minimize CCT. Among different rounds, the prioritization is conducted in a round-robin manner among all tenants. For example, if tenant A is prioritized against tenant B in the first round, tenant B is prioritized against tenant A in the second round. In this way, all tenants are fairly prioritized. Furthermore, as the prioritization order is irrelevant to the coflow properties, a tenant cannot gain more bandwidth by lying about its coflow information.

We evaluate RICH through extensive trace-driven simulations with realistic settings. RICH outperforms HUG by up to 2× in terms of average CCT. Meanwhile, RICH can ensure fair bandwidth allocation among tenants at the granularity of one round.

Section snippets

Background and motivation

In this section, we begin by presenting the background of the coflow abstraction and the non-cooperative environment. Then we show the problems of existing coflow schedulers in non-cooperative environments.

RICH design

Due to the limitations of existing approaches, we design RICH, a strategy-proof scheduler aiming to efficiently schedule coflows.

Evaluation

We evaluated RICH through extensive simulations using realistic Hive/MapReduce traces collected from a large production cluster at Facebook. The main results are listed as follows:

  • [CCT Performance] RICH outperforms other strategy-proof mechanisms by up to 39.3% in terms of average CCT.

  • [Strategy-proofness] RICH can prevent tenants from gaining more bandwidth by cheating, effectively ensuring strategy-proofness.

  • [Fairness] RICH is able to fairly allocate bandwidth to different tenants at the time

Discussion

We consider a special case: a tenant can somehow divide its large coflow into a set of smaller coflows, such that each smaller coflow can finish within one round in HUG for most rounds, then this tenant can always get itself prioritized over other tenants’ coflows which mostly cannot finish in one round in the long run.

Related work

In cooperative environments, many coflow schedulers have been proposed to optimize the CCT performance. Orchestra (Chowdhury et al., 2011) and Baraat (Dogar et al., 2014) improve the average CCT based on FIFO. Varys (Chowdhury et al., 2014) improves performance by using SEBF heuristics for inter-coflow scheduling and minimum-allocation-for-desired-duration (MADD) for intra-coflow scheduling. For minimizing the total weighted completion time of coflows, Qiu et al. (2015) proposed the first

Conclusion

In this paper, we have proposed a new coflow scheduler RICH for non-cooperative multi-tenant datacenter networks. RICH is strategy-proof and can reduce the average CCT of coflows. To prevent tenants from gaining more bandwidth by cheating, RICH employs multi-round scheduling, distributes a certain amount of data transferred for tenants in each round in a max–min manner, , and prioritizes coflow transmission among tenants based on a round-robin manner among different rounds. To guarantee optimal

CRediT authorship contribution statement

Fan Zhang: Methodology, Investigation, Software, Validation, Formal analysis, Visualization, Writing – original draft. Yazhe Tang: Writing – review & editing, Supervision, Funding acquisition. Danfeng Shan: Conceptualization, Methodology, Writing – review & editing, Supervision, Funding acquisition. Huanzhao Wang: Funding acquisition. Chengchen Hu: Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant U19B2025, Grant 61902307, and Grant 61672425, in part by the Fundamental Research Funds for the Central Universities under xzy012020014, and in part by the Natural Science Basic Research Plan in Shaanxi Province of China Program under Grant 2018JM6109 and Grant 2016JM6066.

Fan Zhang is a Ph.D. student in the School of Computer Science and Technology at Xi’an Jiaotong University. His research interests are datacenter networks.

References (30)

  • AbadiM. et al.

    Tensorflow: A system for large-scale machine learning

  • Bai, W., Chen, L., Chen, K., Han, D., Tian, C., Wang, H., 2015. Information-agnostic flow scheduling for commodity data...
  • ChowdhuryN.

    Coflow: A Networking Abstraction for Distributed Data-Parallel Applications

    (2015)
  • ChowdhuryM.

    Coflow-benchmark

    (2015)
  • ChowdhuryM.

    Coflowsim

    (2017)
  • ChowdhuryM. et al.

    HUG: Multi-resource fairness for correlated and elastic demands

  • ChowdhuryM. et al.

    Efficient coflow scheduling without prior knowledge

  • ChowdhuryM. et al.

    Managing data transfers in computer clusters with orchestra

  • ChowdhuryM. et al.

    Efficient coflow scheduling with varys

  • DeanJ. et al.

    MapReduce: simplified data processing on large clusters

    Commun. ACM

    (2008)
  • DogarF.R. et al.

    Decentralized task-aware scheduling for data center networks

  • GhodsiA. et al.

    Dominant resource fairness: Fair allocation of multiple resource types.

  • JainR.K. et al.

    A Quantitative Measure of Fairness and DiscriminationTech. Rep. DEC Research Report TR-301

    (1984)
  • Jajoo, A., Hu, Y.C., Lin, X., 2019. Your coflow has many flows: sampling them for fun and speed. In: 2019 {USENIX}...
  • LiY. et al.

    Efficient online coflow routing and scheduling

  • Cited by (0)

    Fan Zhang is a Ph.D. student in the School of Computer Science and Technology at Xi’an Jiaotong University. His research interests are datacenter networks.

    Yazhe Tang received his B.Sc., M.Sc. and Ph.D. degrees from Xi’an Jiaotong University, Xi’an, China, in 1993, 1996 and 2002, respectively. He worked as a Post-Doctoral Research Associate in the Department of Computer Science at the University of Western Ontario, Canada from 2004 to 2006. He is now an Associate Professor and Vice Dean in the School of Computer Science and Technology, Xi’an Jiaotong University. His main research interests include computer networking systems.

    Danfeng Shan received the B.E. degree in computer science and technology from Xi’an Jiaotong University, China, in 2013, and the Ph.D. degree in computer science and technology from Tsinghua University, China, in 2018. He is currently an Assistant Professor with the School of Computer Science and Technology, Xi’an Jiaotong University. His research interests include datacenter networks and congestion control.

    Huanzhao Wang received her Ph.D. degree in computer science from Xi’an Jiaotong University, China, in 2009. She is now an Associate Professor in the School of Computer Science and Technology, Xi’an Jiaotong University. Her research interests include wireless networks and software-defined networks.

    Chengchen Hu is now Chief Expert and AVP at NIO Inc. Prior to joining NIO, he was a Principal Engineer and the founding director of Xilinx Labs Asia Pacific located in Singapore. Before his experience with Xilinx, he was a Professor and the Department Head at the Department of Computer Science and Technology, Xi’an Jiaotong University in P. R. China. He is recipient of the New Century Excellent Talents in University award from the Ministry of Education, China, a fellowship from the European Research Consortium for Informatics and Mathematics (ERCIM), a fellowship of Microsoft “Star-Track” Young Faculty. His research theme is to monitor, diagnose and manage networking and distributed computing through hardware optimized and software-defined systematical approaches.

    View full text