AQUA+: Query Optimization for Hybrid Database-MapReduce System,Knowledge and Information Systems

当前位置： X-MOL 学术 › Knowl. Inf. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

AQUA+: Query Optimization for Hybrid Database-MapReduce System
Knowledge and Information Systems ( IF 2.5 ) Pub Date : 2021-02-05 , DOI: 10.1007/s10115-020-01542-4
Zhifei Pang , Sai Wu , Haichao Huang , Zhouzhenyan Hong , Yuqing Xie

MapReduce has been widely recognized as an efficient tool for large-scale data analysis. It achieves high performance by exploiting parallelism among processing nodes while providing a simple interface for upper-layer applications. However, there are many existing applications maintaining their data in a distributed database. It is costly to export those data into the storage system of MapReduce (normally a distributed file system). Moreover, compared to MapReduce, database is equipped with many state-of-the-art techniques, such as index and optimizer. Therefore, a hybrid Database-MapReduce system inheriting the advantages of both systems is preferred. In this paper, we propose AQUA+, a query optimizer tailored for the hybrid system. AQUA+ is an extension work of our previous system AQUA. It generates a plan that adaptively assigns the operators to the database engine and MapReduce engine to optimize the performance. The intuition is to exploit the index, co-partition and other features provided by the database as much as possible and reduce the data volume processed by the MapReduce. Due to the complexity of query optimization, in AQUA+, we introduce a novel tuning technique, learning to optimize. In particular, two neural networks are trained to predict cost and refine query plan, respectively. We train them based on our log of real query processing. Experiments carried out on our in-house cluster confirm the effectiveness of our query optimizer.

中文翻译：

AQUA +：混合数据库-MapReduce系统的查询优化

MapReduce被公认为是进行大规模数据分析的有效工具。它通过利用处理节点之间的并行性来实现高性能，同时为上层应用程序提供简单的接口。但是，有许多现有的应用程序在分布式数据库中维护其数据。将这些数据导出到MapReduce的存储系统（通常是分布式文件系统）中会非常昂贵。而且，与MapReduce相比，数据库配备了许多最新技术，例如索引和优化器。因此，首选继承了两个系统优点的混合Database-MapReduce系统。在本文中，我们提出了针对混合系统量身定制的查询优化器AQUA +。AQUA +是我们先前系统AQUA的扩展。它生成一个计划，该计划将操作员自适应地分配给数据库引擎和MapReduce引擎，以优化性能。直觉是要尽可能利用数据库提供的索引，共分区和其他功能，并减少MapReduce处理的数据量。由于查询优化的复杂性，在AQUA +中，我们引入了一种新颖的调整技术，即学习优化。特别是，训练了两个神经网络来分别预测成本和优化查询计划。我们根据实际查询处理的日志对它们进行训练。在我们内部集群上进行的实验证实了我们的查询优化器的有效性。共享分区和数据库提供的其他功能，并尽可能减少MapReduce处理的数据量。由于查询优化的复杂性，在AQUA +中，我们引入了一种新颖的调整技术，即学习优化。特别是，训练了两个神经网络来分别预测成本和优化查询计划。我们根据实际查询处理的日志对它们进行训练。在我们内部集群上进行的实验证实了我们的查询优化器的有效性。共享分区和数据库提供的其他功能，并尽可能减少MapReduce处理的数据量。由于查询优化的复杂性，我们在AQUA +中引入了一种新颖的调整技术，即学习优化。特别是，训练了两个神经网络来分别预测成本和优化查询计划。我们根据实际查询处理的日志对它们进行训练。在我们内部集群上进行的实验证实了我们的查询优化器的有效性。

更新日期：2021-02-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11