当前位置: X-MOL 学术J. Parallel Distrib. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Django: Bilateral coflow scheduling with predictive concurrent connections
Journal of Parallel and Distributed Computing ( IF 3.4 ) Pub Date : 2021-02-17 , DOI: 10.1016/j.jpdc.2021.01.006
Jiaqi Zheng , Liulan Qin , Kexin Liu , Bingchuan Tian , Chen Tian , Bo Li , Guihai Chen

For data-parallel frameworks, their communication is highly structured. Coflow is a networking abstraction proposed for their all-or-nothing job-specific semantics. Minimizing coflow completion time (CCT) decreases the completion time of corresponding jobs. However, state-of-the-art coflow scheduling approaches suffer from several drawbacks. On the one hand, both sender-driven and receiver-driven scheduling approaches fail to achieve optimal especially when the bandwidth bottleneck exists. On the other hand, they fail to optimize the number of concurrent connections since the CCT can be prolonged due to too many or too few concurrent connections.

In this paper, we propose Django, a bilateral coflow scheduling framework. We first use Support Vector Machine (SVM) as the machine learning model to automatically identify the optimal number of concurrent connections, i.e., the queue limitation in the switch. Based on the predicted results, we further develop a set of distributed coflow scheduling algorithms in a scalable manner. Testbed experiments and trace-driven simulations show that Django can estimate the number of concurrent connections with an accuracy of 98%, reduce the average CCT and 95th percentile CCT by 15% and 40%, respectively.



中文翻译:

Django:具有预测并发连接的双边同流调度

对于数据并行框架,它们的通信是高度结构化的。Coflow是针对他们的全有或全无的特定工作语义而提出的一种网络抽象。使coflow完成时间(CCT)最小化可减少相应作业的完成时间。然而,最新的同流调度方法具有几个缺点。一方面,尤其是在存在带宽瓶颈的情况下,发送方驱动的调度方法和接收方驱动的调度方法都无法达到最佳。另一方面,由于并发连接太多或太少,CCT可能会延长,因此它们无法优化并发连接的数量。

在本文中,我们提出了一个Django双向同流调度框架Django。我们首先使用支持向量机(SVM)作为机器学习模型来自动识别并发连接的最佳数量,交换机中的队列限制。基于预测的结果,我们以可伸缩的方式进一步开发了一组分布式同流调度算法。测试平台实验和跟踪驱动的仿真表明,Django可以估计并发连接的数量,准确性为98%,平均CCT和95%的CCT分别降低15%和40%。

更新日期:2021-03-07
down
wechat
bug