One size does not fit all: accelerating OLAP workloads with GPUs,Distributed and Parallel Databases

当前位置： X-MOL 学术 › Distrib. Parallel. Databases › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

One size does not fit all: accelerating OLAP workloads with GPUs
Distributed and Parallel Databases ( IF 1.2 ) Pub Date : 2020-07-31 , DOI: 10.1007/s10619-020-07304-z
Yansong Zhang , Yu Zhang , Jiaheng Lu , Shan Wang , Zhuan Liu , Ruichen Han

GPU has been considered as one of the next-generation platforms for real-time query processing databases. In this paper we empirically demonstrate that the representative GPU databases [e.g., OmniSci (Open Source Analytical Database & SQL Engine, https://www.omnisci.com/platform/omniscidb, 2019)] may be slower than the representative in-memory databases [e.g., Hyper (Neumann and Leis, IEEE Data Eng Bull 37(1):3–11, 2014)] with typical OLAP workloads (with Star Schema Benchmark) even if the actual dataset size of each query can completely fit in GPU memory. Therefore, we argue that GPU database designs should not be one-size-fits-all; a general-purpose GPU database engine may not be well-suited for OLAP workloads without careful designed GPU memory assignment and GPU computing locality. In order to achieve better performance for GPU OLAP, we need to re-organize OLAP operators and re-optimize OLAP model. In particular, we propose the 3-layer OLAP model to match the heterogeneous computing platforms. The core idea is to maximize data and computing locality to specified hardware. We design the vector grouping algorithm for data-intensive workload which is proved to be assigned to CPU platform adaptive. We design the TOP-DOWN query plan tree strategy to guarantee the optimal operation in final stage and pushing the respective optimizations to the lower layers to make global optimization gains. With this strategy, we design the 3-stage processing model (OLAP acceleration engine) for hybrid CPU-GPU platform, where the computing-intensive star-join stage is accelerated by GPU, and the data-intensive grouping & aggregation stage is accelerated by CPU. This design maximizes the locality of different workloads and simplifies the GPU acceleration implementation. Our experimental results show that with vector grouping and GPU accelerated star-join implementation, the OLAP acceleration engine runs 1.9×, 3.05× and 3.92× faster than Hyper, OmniSci GPU and OmniSci CPU in SSB evaluation with dataset of SF = 100.

中文翻译：

一种尺寸并不适合所有人：使用 GPU 加速 OLAP 工作负载

GPU 被认为是实时查询处理数据库的下一代平台之一。在本文中，我们凭经验证明代表性 GPU 数据库 [例如，OmniSci（开源分析数据库和 SQL 引擎，https://www.omnisci.com/platform/omniscidb，2019）] 可能比代表性的内存中慢数据库 [例如，Hyper (Neumann and Leis, IEEE Data Eng Bull 37(1):3–11, 2014)] 具有典型的 OLAP 工作负载（使用 Star Schema Benchmark），即使每个查询的实际数据集大小可以完全适合 GPU记忆。因此，我们认为 GPU 数据库设计不应一刀切；如果没有精心设计的 GPU 内存分配和 GPU 计算局部性，通用 GPU 数据库引擎可能不太适合 OLAP 工作负载。为了获得更好的GPU OLAP性能，我们需要重新组织OLAP算子，重新优化OLAP模型。特别是，我们提出了 3 层 OLAP 模型来匹配异构计算平台。核心思想是将数据和计算局部性最大化到指定硬件。我们为数据密集型工作负载设计了向量分组算法，该算法被证明可以自适应地分配给 CPU 平台。我们设计了 TOP-DOWN 查询计划树策略来保证最后阶段的最优操作，并将各自的优化推到较低层以获得全局优化收益。在此策略下，我们为混合 CPU-GPU 平台设计了 3 阶段处理模型（OLAP 加速引擎），其中计算密集型星连接阶段由 GPU 加速，数据密集型分组聚合阶段由中央处理器。这种设计最大化了不同工作负载的局部性，并简化了 GPU 加速实现。我们的实验结果表明，在使用 SF = 100 的数据集进行 SSB 评估时，通过向量分组和 GPU 加速的星形连接实现，OLAP 加速引擎的运行速度比 Hyper、OmniSci GPU 和 OmniSci CPU 快 1.9 倍、3.05 倍和 3.92 倍。

更新日期：2020-07-31

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>