当前位置: X-MOL 学术J. Supercomput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GPU-based efficient join algorithms on Hadoop
The Journal of Supercomputing ( IF 3.3 ) Pub Date : 2020-04-03 , DOI: 10.1007/s11227-020-03262-6
Hongzhi Wang , Ning Li , Zheng Wang , Jianing Li

The growing data have brought tremendous pressure for query processing and storage, so there are many studies that focus on using GPU to accelerate join operation, which is one of the most important operations in modern database systems. However, existing GPU acceleration join operation researches are not very suitable for the join operation on big data. Based on this, this paper speeds up nested loop join, hash join and theta join, combining Hadoop with GPU, which is also the first to use GPU to accelerate theta join. At the same time, after the data pre-filtering and pre-processing, using MapReduce and HDFS in Hadoop proposed in this paper, the larger data table can be handled, compared to existing GPU acceleration methods. Also with MapReduce in Hadoop, the algorithm proposed in this paper can estimate the number of results more accurately and allocate the appropriate storage space without unnecessary costs, making it more efficient. Experimental results show that comparing with GPU-based approach without Hadoop, our approach increases the speed by 1.5–2 times, and comparing with the Hadoop-based approaches without GPU, our approach increases the speed by 1.3–2 times.

中文翻译:

Hadoop 上基于 GPU 的高效连接算法

不断增长的数据给查询处理和存储带来了巨大的压力,因此很多研究都集中在使用GPU来加速join操作,这是现代数据库系统中最重要的操作之一。但是,现有的GPU加速join操作研究不太适合大数据上的join操作。基于此,本文加速了嵌套循环连接、哈希连接和θ连接,将Hadoop与GPU相结合,这也是第一个使用GPU加速θ连接。同时,在数据预过滤和预处理之后,使用本文提出的Hadoop中的MapReduce和HDFS,相比现有的GPU加速方法,可以处理更大的数据表。还有 Hadoop 中的 MapReduce,本文提出的算法可以更准确地估计结果的数量并分配适当的存储空间,而不会产生不必要的成本,使其更高效。实验结果表明,与没有Hadoop的基于GPU的方法相比,我们的方法提高了1.5-2倍的速度,与没有GPU的基于Hadoop的方法相比,我们的方法提高了1.3-2倍的速度。
更新日期:2020-04-03
down
wechat
bug