Abstract
As the link speed has grown steadily from 10 Gbps to 100 Gbps, high-speed data center networks (DCNs) require more efficient congestion management. Therefore, proactive transports, especially credit-based congestion control, nowadays have drawn much attention because of fast convergence, near-zero queueing and low latency. However, in real deployment scenarios, it is hard to guarantee one protocol to be deployed in every host at one time. Thus, when the credit-based protocols are deployed into DCNs incrementally, the network will convert to multi-protocol state and face the following fundamental challenges: (i) unfairness, (ii) non-convergence, and (iii) high buffer occupancy. In this paper, we propose a new protocol, called CCRP, aiming for converging credit-based and reactive protocols in data centers. Targeting the mostly deployed protocol, i.e. DCQCN based on explicit congestion notification (ECN), in DCNs, CCRP leverages the forward ECN to detect the network congestion in data queue and optimizes feedback control of the credit-based transports. Our experiment results show that this design can address the unfair link allocation and converge with reactive protocols rapidly. Furthermore, CCRP achieves high utilization and low buffer occupancy at the same time.
Similar content being viewed by others
References
Jose, L., et al.: High speed networks need proactive congestion control. In: Proceedings of HotNets, pp. 1–7. (2015). https://doi.org/10.1145/2834050.2834096
Singh, A., et al.: Jupiter rising: a decade of clos topologies and centralized control in google’s datacenter network. Commun. ACM 45, 188–197 (2016). https://doi.org/10.1145/2785956.2787508
Wilson, C., et al.: Better never than late: meeting deadlines in datacenter networks. In: Proceedings of SIGCOMM, pp. 50–61. (2011). https://doi.org/10.1145/2018436.2018443
Wu, H., et al.: ICTCP: Incast Congestion Control for TCP in Data-Center Networks. In: Proceedings of CoNEXT, pp. 1–12. (2010). https://doi.org/10.1145/1921168.1921186
Eran, H., et al.: Congestion control for large-scale RDMA deployments. In: Proceedings of SIGCOMM, pp. 523–536. (2015). https://doi.org/10.1145/2785956.2787484
Alizadeh, M., et al.: Data center TCP (DCTCP). In: Proceedings of SIGCOMM, pp. 63–74. (2010). https://doi.org/10.1145/1851182.1851192
Mittal, R., et al.: Timely: RTT-based congestion control for the datacenter. In: Proceedings of SIGCOMM, pp. 537–550. (2015). https://doi.org/10.1145/2785956.2787510
Hong, C., et al. Finishing Flows Quickly with Preemptive Scheduling. In: Proceedings of SIGCOMM, pp. 127–138. (2015). https://doi.org/10.1145/2377677.2377710
Gao, P., et al. phost: Distributed near-optimal datacenter transport over commodity network fabric. In: Proceedings of CoNEXT, pp. 1–12. (2015). https://doi.org/10.1145/2716281.2836086
Perry, J., et al.: Fastpass: A centralized-zero-queue datacenter network. In: Proceedings of SIGCOMM, pp. 307– 318. (2014). https://doi.org/10.1145/2619239.2626309
Cho, I., et al.: Credit-scheduled delay-bounded congestion control for datacenters. In: Proceedings of SIGCOMM, pp. 239–252. (2017). https://doi.org/10.1145/3098822.3098840
Jiang, N., et al.: Network congestion avoidance through speculative reservation. In: Proceedings of HPCA, pp. 1–12. (2012). https://doi.org/10.1109/HPCA.2012.6169047
Montazeri, B., et al.: Homa: A receiver-driven low-latency transport protocol using network priorities. In: Proceedings of SIGCOMM, pp. 221–235. (2018). https://doi.org/10.1145/3230543.3230564
Zhang, Y., et al.: BDS: a centralized near-optimal overlay network for inter-datacenter data replication. In: Proceedings of EuroSys, pp.1–14. (2018). https://doi.org/10.1145/3190508.3190519
Mittal, R., et al.: Revisiting network support for RDMA. In: Proceedings of SIGCOMM, pp. 313–326. (2018). https://doi.org/10.1145/3230543.3230557
Michelogiannakis, G., et al.: Channel reservation protocol for over-subscribed channels and destinations. In: Proceedings of HPCA, pp. 52:1–52:12. (2013). https://doi.org/10.1145/2503210.2503213
Nan, J., et al.: Network endpoint congestion control for fine-grained communication. In: Proceedings of SC, pp. 35:1–35:12. (2015). https://doi.org/10.1145/2807591.2807600
Judd, G., et al.: Attaining the promise and avoiding the pitfalls of TCP in the datacenter. In: Proceedings of NSDI, pp. 145–157. (2015). https://doi.org/10.5555/2789770.2789781
He, K., et al.: AC/DC TCP: Virtual congestion control enforcement for datacenter networks. In: Proceedings of SIGCOMM, pp. 244–257. (2016). https://doi.org/10.1145/2934872.2934903
Alizadeh, M., et al.: pFabric: minimal near-optimal datacenter transport. In: Proceedings of SIGCOMM, pp. 435–446. (2013). https://doi.org/10.1145/2486001.2486031
Fall, K., et al.: Simulation-based comparisons of (tahoe, reno and sack tcp). In: Proceedings of SIGCOMM, pp. 5–21. (1996). https://doi.org/10.1145/235160.235162
Zats, D., et al.: DeTail: reducing the flow completion time tail in datacenter networks. In: Proceedings of SIGCOMM, pp. 139–150. (2012). https://doi.org/10.1145/2377677.2377711
Lee, C., et al. Accurate Latency-based Congestion Feedback for Datacenters. In: Proceedings of USENIX ATC, pp. 403–415. (2015). https://doi.org/10.1109/TNET.2016.2587286
Ha, S., et al.: CUBIC: a new TCP-friendly high-speed TCP variant. ACM SIGOPS Op. Syst. Rev. 42, 64–74 (2008). https://doi.org/10.1145/1400097.1400105
Hu, S., et al.: Augmenting proactive congestion control with aeolus. In: Proceedings of APNet, pp. 22–28. (2018). https://doi.org/10.1145/3232565.3232567
OMNeT++: discrete event simulator. http://omnetpp.org/
INET Framework. https://inet.omnetpp.org/
Varga, A., et al.: An overview of the OMNeT++ simulation environment. In: Proceedings of SIMUTools, pp. 1–10. (2008). https://doi.org/10.1145/1416222.1416290
Kung, H., et al.: Credit-based flow control for ATM networks: credit update protocol, adaptive credit allocation, and statistical multiplexing. In: Proceedings of SIGCOMM, pp. 101–114. (1994). https://doi.org/10.1145/190314.190324
Yang, X., et al.: A dos-limiting network architecture. In: Proceedings of SIGCOMM, pp. 241–252. (2005). https://doi.org/10.1145/1080091.1080120
Acknowledgements
We would like to thank the anonymous reviewers for their insightful comments. We gratefully acknowledge members of Tianhe interconnect group at NUDT for many inspiring conversations. The work was supported by the National Key R&D Program of China under Grant No. 2018YFB0204300.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bai, Y., Hu, D., Dong, D. et al. CCRP: Converging Credit-Based and Reactive Protocols in Datacenters. Int J Parallel Prog 49, 685–699 (2021). https://doi.org/10.1007/s10766-021-00698-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-021-00698-y