当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Network-Aware Locality Scheduling for Distributed Data Operators in Data Centers
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2021-01-20 , DOI: 10.1109/tpds.2021.3053241
Long Cheng , Ying Wang , Qingzhi Liu , Dick H.J. Epema , Cheng Liu , Ying Mao , John Murphy

Large data centers are currently the mainstream infrastructures for big data processing. As one of the most fundamental tasks in these environments, the efficient execution of distributed data operators (e.g., join and aggregation) are still challenging current data systems, and one of the key performance issues is network communication time. State-of-the-art methods trying to improve that problem focus on either application-layer data locality optimization to reduce network traffic or on network-layer data flow optimization to increase bandwidth utilization. However, the techniques in the two layers are totally independent from each other, and performance gains from a joint optimization perspective have not yet been explored. In this article, we propose a novel approach called NEAL (NEtwork-Aware Locality scheduling) to bridge this gap, and consequently to further reduce communication time for distributed big data operators. We present the detailed design and implementation of NEAL, and our experimental results demonstrate that NEAL always performs better than current approaches for different workloads and network bandwidth configurations.

中文翻译:

数据中心中分布式数据操作员的网络感知位置调度

大型数据中心当前是大数据处理的主流基础架构。作为这些环境中最基本的任务之一,高效地执行分布式数据运算符(例如,联接和聚合)仍然挑战当前的数据系统,而关键的性能问题之一就是网络通信时间。试图解决该问题的最新方法集中在应用程序层数据局部性优化以减少网络流量,或网络层数据流优化以增加带宽利用率。但是,这两层中的技术是完全相互独立的,并且尚未探讨从联合优化的角度来看性能的提高。在本文中,我们提出了一种称为NEAL(NEtwork-Aware Locality调度)的新颖方法来弥合这一差距,从而进一步减少了分布式大数据运营商的通信时间。我们介绍了NEAL的详细设计和实现,我们的实验结果表明,对于不同的工作负载和网络带宽配置,NEAL总是比当前方法表现更好。
更新日期:2021-02-12
down
wechat
bug