当前位置: X-MOL 学术IEEE Trans. Cloud Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficiently Translating Complex SQL Query to MapReduce Jobflow on Cloud
IEEE Transactions on Cloud Computing ( IF 5.3 ) Pub Date : 2020-04-01 , DOI: 10.1109/tcc.2017.2700842
Zhiang Wu , Aibo Song , Jie Cao , Junzhou Luo , Lu Zhang

MapReduce is a widely-used programming model in cloud environment for parallel processing large-scale data sets. The combination of the high-level language with a SQL-to-MapReduce translator allows programmers to code using SQL-like declarative language, so that each program can afterwards be complied into a MapReduce jobflow automatically. This way is helpful to narrow the gap between non-professional users and cloud platforms, and thus significantly improve the usability of the cloud. Although a number of translators have been developed, the auto-generated MapReduce programs still suffered from extremely inefficiency. In this paper, we present an efficient Cost-Aware SQL-to-MapReduce Translator (CAT). CAT has two notable features. First, it defines two intra-SQL correlations: Generalized Job Flow Correlation (GJFC) and Input Correlation (IC), based on which a set of looser merging rules are introduced. Thus, both Top-Down (TD) and Bottom-Up (BU) merging strategies are proposed and integrated into CAT simultaneously. Second, it adopts a cost estimation model for MapReduce jobflows to guide the selection of a more efficient MapReduce jobflows auto-generated by TD and BU merging strategies. Finally, comparative experiments on TPC-H benchmark demonstrate the effectiveness and scalability of CAT.

中文翻译:

将复杂的 SQL 查询有效地转换为云上的 MapReduce 作业流

MapReduce 是一种在云环境中广泛使用的编程模型,用于并行处理大规模数据集。高级语言与 SQL 到 MapReduce 转换器的组合允许程序员使用类似 SQL 的声明性语言进行编码,以便每个程序之后都可以自动编译到 MapReduce 作业流中。这种方式有助于缩小非专业用户与云平台之间的差距,从而显着提高云的可用性。尽管已经开发了许多翻译器,但自动生成的 MapReduce 程序仍然效率极低。在本文中,我们提出了一种高效的成本感知 SQL 到 MapReduce 转换器 (CAT)。CAT 有两个显着的特点。首先,它定义了两个 SQL 内相关性:广义作业流相关性 (GJFC) 和输入相关性 (IC),在此基础上引入了一套更宽松的合并规则。因此,同时提出了自顶向下 (TD) 和自底向上 (BU) 合并策略并将其集成到 CAT 中。其次,它采用 MapReduce 作业流的成本估算模型来指导选择由 TD 和 BU 合并策略自动生成的更有效的 MapReduce 作业流。最后,在 TPC-H 基准上的对比实验证明了 CAT 的有效性和可扩展性。
更新日期:2020-04-01
down
wechat
bug