当前位置: X-MOL 学术J. Supercomput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Toward efficient execution of data-intensive workflows
The Journal of Supercomputing ( IF 2.5 ) Pub Date : 2021-01-12 , DOI: 10.1007/s11227-020-03612-4
Oleg Sukhoroslov

Workflows that consume and produce large amounts of data are being widely used in modern scientific computing and data processing pipelines. Scheduling of data-intensive workflows requires a careful management of data transfers between tasks, since network contention can significantly impact the workflow execution time. The paper presents and evaluates several scheduling algorithms, data transfer strategies and optimizations aimed at efficient execution of data-intensive workflows. The studied approaches reduce or completely avoid network contention by explicit scheduling of data transfers and incorporate several optimizations, such as data caching, chunked and peer-to-peer data transfers. The results of experimental study demonstrate that the relative performance of different approaches depends on the workflow properties, data staging strategy and system configuration. The proposed CAS-L1 heuristic with additional data transfer optimizations achieves the best results.



中文翻译:

致力于高效执行数据密集型工作流

消耗并产生大量数据的工作流已广泛用于现代科学计算和数据处理管道中。数据密集型工作流程的计划要求仔细管理任务之间的数据传输,因为网络争用会严重影响工作流程的执行时间。本文介绍并评估了几种调度算法,数据传输策略和优化,旨在高效执行数据密集型工作流。所研究的方法通过显式安排数据传输来减少或完全避免网络争用,并结合了几种优化方法,例如数据缓存,分块和对等数据传输。实验研究结果表明,不同方法的相对性能取决于工作流属性,数据分段策略和系统配置。拟议的CAS-L1启发式算法以及其他数据传输优化可实现最佳结果。

更新日期:2021-01-12
down
wechat
bug