当前位置: X-MOL 学术VLDB J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RHEEMix in the data jungle: a cost-based optimizer for cross-platform systems
The VLDB Journal ( IF 4.2 ) Pub Date : 2020-05-18 , DOI: 10.1007/s00778-020-00612-x
Sebastian Kruse , Zoi Kaoudi , Bertty Contreras-Rojas , Sanjay Chawla , Felix Naumann , Jorge-Arnulfo Quiané-Ruiz

Data analytics are moving beyond the limits of a single platform. In this paper, we present the cost-based optimizer of Rheem, an open-source cross-platform system that copes with these new requirements. The optimizer allocates the subtasks of data analytic tasks to the most suitable platforms. Our main contributions are: (i) a mechanism based on graph transformations to explore alternative execution strategies; (ii) a novel graph-based approach to determine efficient data movement plans among subtasks and platforms; and (iii) an efficient plan enumeration algorithm, based on a novel enumeration algebra. We extensively evaluate our optimizer under diverse real tasks. We show that our optimizer can perform tasks more than one order of magnitude faster when using multiple platforms than when using a single platform.



中文翻译:

数据丛林中的RHEEMix:针对跨平台系统的基于成本的优化器

数据分析正在超越单个平台的限制。在本文中,我们介绍了Rheem的基于成本的优化器,Rheem是一种可以满足这些新要求的开源跨平台系统。优化器将数据分析任务的子任务分配给最合适的平台。我们的主要贡献是:(i)基于图变换探索替代执行策略的机制;(ii)一种新颖的基于图的方法来确定子任务和平台之间的有效数据移动计划;(iii)基于新颖的枚举代数的有效计划枚举算法。我们在各种实际任务下广泛评估优化器。我们表明,与使用单个平台相比,使用多个平台时,我们的优化器可以更快地执行一个数量级以上的任务。

更新日期:2020-05-18
down
wechat
bug