One size does not fit all: accelerating OLAP workloads with GPUs

Zhang, Yansong; Zhang, Yu; Lu, Jiaheng; Wang, Shan; Liu, Zhuan; Han, Ruichen

doi:10.1007/s10619-020-07304-z

One size does not fit all: accelerating OLAP workloads with GPUs

Published: 31 July 2020

Volume 38, pages 995–1037, (2020)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Yansong Zhang^1,2,
Yu Zhang³,
Jiaheng Lu⁴,
Shan Wang^1,2,
Zhuan Liu^1,2 &
…
Ruichen Han^1,2

708 Accesses
3 Citations
Explore all metrics

Abstract

GPU has been considered as one of the next-generation platforms for real-time query processing databases. In this paper we empirically demonstrate that the representative GPU databases [e.g., OmniSci (Open Source Analytical Database & SQL Engine, https://www.omnisci.com/platform/omniscidb, 2019)] may be slower than the representative in-memory databases [e.g., Hyper (Neumann and Leis, IEEE Data Eng Bull 37(1):3–11, 2014)] with typical OLAP workloads (with Star Schema Benchmark) even if the actual dataset size of each query can completely fit in GPU memory. Therefore, we argue that GPU database designs should not be one-size-fits-all; a general-purpose GPU database engine may not be well-suited for OLAP workloads without careful designed GPU memory assignment and GPU computing locality. In order to achieve better performance for GPU OLAP, we need to re-organize OLAP operators and re-optimize OLAP model. In particular, we propose the 3-layer OLAP model to match the heterogeneous computing platforms. The core idea is to maximize data and computing locality to specified hardware. We design the vector grouping algorithm for data-intensive workload which is proved to be assigned to CPU platform adaptive. We design the TOP-DOWN query plan tree strategy to guarantee the optimal operation in final stage and pushing the respective optimizations to the lower layers to make global optimization gains. With this strategy, we design the 3-stage processing model (OLAP acceleration engine) for hybrid CPU-GPU platform, where the computing-intensive star-join stage is accelerated by GPU, and the data-intensive grouping & aggregation stage is accelerated by CPU. This design maximizes the locality of different workloads and simplifies the GPU acceleration implementation. Our experimental results show that with vector grouping and GPU accelerated star-join implementation, the OLAP acceleration engine runs 1.9×, 3.05× and 3.92× faster than Hyper, OmniSci GPU and OmniSci CPU in SSB evaluation with dataset of SF = 100.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

HG-Bitmap Join Index: A Hybrid GPU/CPU Bitmap Join Index Mechanism for OLAP

Integration of FPGAs in Database Management Systems: Challenges and Opportunities

Article 10 September 2018

Andreas Becher, Lekshmi B.G., … Stefan Wildermann

Out-of-the-box library support for DBMS operations on GPUs

Article Open access 10 May 2023

Harish Kumar Harihara Subramanian, Bala Gurumurthy, … Gunter Saake

References

Neumann, T., Leis, V.: Compiling database queries into machine code. IEEE Data Eng. Bull. 37(1), 3–11 (2014)
Google Scholar
Open Source Analytical Database & SQL Engine. (2019). https://www.omnisci.com/platform/omniscidb
HGX-2 Fuses HPC and AI Computing Architectures. (2018). https://devblogs.nvidia.com/hgx-2-fuses-ai-computing/
Kinetica is the insight engine for the Extreme Data Economy. (2018). https://www.kinetica.com/
SQream DB is the GPU Data Warehouse for massive data. (2018). https://sqream.com/
FASTEST, SMARTEST-AI-Oriented Data Processing Platform. (2018). https://zilliz.com/
PG-Strom - Master development repository. (2017). https://github.com/heterodb/pg-strom
Balkesen, C., Teubner, J., Alonso, G., Tamer Özsu, M.: Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. ICDE 2013, 362–373 (2013)
Google Scholar
Balkesen, C., Alonso, G., Teubner, J., Tamer Özsu, M.: Multi-core, main-memory joins: sort vs hash revisited. PVLDB 7(1), 85–96 (2013)
Google Scholar
Richter, S., Alvarez, V., Dittrich, J.: A seven-dimensional analysis of hashing methods and its implications on query processing. PVLDB 9(3), 96–107 (2015)
Google Scholar
Schuh, S., Chen, X., Dittrich, J.: An experimental comparison of thirteen relational equi-joins in main memory. SIGMOD Conference 2016, pp. 1961–1976 (2016)
Cheng, X., He, B., Lu, M., Lau, C.T., Huynh, H.P., Goh, R.S.M.: Efficient query processing on many-core architectures: a case study with intel Xeon Phi Processor. SIGMOD Conference 2016, pp. 2081–2084 (2016)
Cheng, X., He, B., Du, X., Lau, C.T.: A study of main-memory hash joins on many-core processor: a case with Intel knights landing architecture. CIKM 2017, pp. 657–666 (2017)
He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N.K., Luo, Q., Sander, P.V.: Relational joins on graphics processors. SIGMOD Conference 2008, pp. 511–524 (2008)
Rui, R., Li, H., Tu, Y.-C.: Join algorithms on GPUs: a revisit after seven years. Big Data 2015, 2541–2550 (2015)
Google Scholar
He, J., Mian, Lu, He, B.: Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. PVLDB 6(10), 889–900 (2013)
Google Scholar
Halstead, R.J., Absalyamov, I., Najjar, W.A., Tsotras, V.J.: FPGA-based Multithreading for In-Memory Hash Joins. CIDR (2015)
Kara, K., Giceva, J., Alonso, G.: FPGA-based Data Partitioning. SIGMOD Conference 2017, pp. 433–445 (2017)
MapD Technical Whitepaper. The world’s fastest platform for data exploration. (2018). https://go3.mapd.com/resources/whitepapers/mapd/lp
Sompolski, J., Zukowski, M., Boncz, P.A.: Vectorization vs. compilation in query execution. DaMoN 2011, pp. 33–40 (2011)
Furst, E., Oskin, M., Howe, B.: Profiling a GPU database implementation: a holistic view of GPU resource utilization on TPC-H queries. DaMoN 3(1–3), 6 (2017)
Google Scholar
Boncz, P.A., Kersten, M.L., Manegold, S.: Breaking the memory wall in MonetDB. Commun. ACM 51(12), 77–85 (2008)
Article Google Scholar
Breß, S., Heimel, M., Siegmund, N., Bellatreche, L., Saake, G.: GPU-accelerated database systems: survey and open challenges. Trans. Large-Scale Data Knowl. Cent. Syst. 15, 1–35 (2014)
Google Scholar
Zukowski, M., Boncz, P.A., Nes, N., Héman, S.: MonetDB/X100: a DBMS in the CPU cache. IEEE Data Eng. Bull. 28(2), 17–22 (2005)
Google Scholar
Zhang, Y., Zhou, X., Ying Zhang, Yu, Zhang, M.S., Wang, S.: Virtual denormalization via array index reference for main memory OLAP. IEEE Trans. Knowl. Data Eng. 28(4), 1061–1074 (2016)
Article Google Scholar
Zhang, Y., Zhang, Y., Wang, S., Lu, J.: Fusion OLAP: fusing the pros of MOLAP and ROLAP together for in-memory OLAP. https://doi.org/10.1109/TKDE.2018.2867522. https://ieeexplore.ieee.org/document/8449096
HyPer—a hybrid OLTP&OLAP high performance DBMS. (2015). https://hyper-db.com/
https://docs.omnisci.com/latest/4_centos7-yum-gpu-os-recipe.html (2019)
https://github.com/heterodb/pg-strom (2016)
https://zilliz.com/docs/analytics_overview (2019)
https://www.top500.org/lists/2018/06/ (2018)
https://www.top500.org/green500/lists/2019/11/ (2019)
GPU-Accelerated Analytics on your Data Lake. (2018). https://blazingdb.com/
Paul, J., He, J., He, B: GPL: a GPU-based pipelined query processing engine. Proceedings of the 2016 International Conference on Management of Data. ACM (2016)
Funke, H., Breß, S., Noll, S., Markl, V., Teubner, J.: Pipelined query processing in coprocessor environments. SIGMOD Conference 2018, pp. 1603–1618 (2018)
Chrysogelos, P., et al.: HetExchange: encapsulating heterogeneous CPU-GPU parallelism in JIT compiled engines. Proc. VLDB Endow 12(5), 544–556 (2019)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61772533, 61732014), the Natural Science Foundation of Beijing (Grant No. 4192066) and Academy of Finland, Finland (Grant No. s310321). We also thank the cooperation project from Intel in China.

Author information

Authors and Affiliations

DEKE Lab at Renmin, University of China, Beijing, China
Yansong Zhang, Shan Wang, Zhuan Liu & Ruichen Han
School of Information, Renmin University of China, Beijing, China
Yansong Zhang, Shan Wang, Zhuan Liu & Ruichen Han
National Satellite Meteorological Center of China, Beijing, China
Yu Zhang
Department of Computer Science, University of Helsinki, Helsinki, Finland
Jiaheng Lu

Authors

Yansong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiaheng Lu
View author publications
You can also search for this author in PubMed Google Scholar
Shan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ruichen Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Zhang, Y., Lu, J. et al. One size does not fit all: accelerating OLAP workloads with GPUs. Distrib Parallel Databases 38, 995–1037 (2020). https://doi.org/10.1007/s10619-020-07304-z

Download citation

Published: 31 July 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s10619-020-07304-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

One size does not fit all: accelerating OLAP workloads with GPUs

Abstract

Access this article

Similar content being viewed by others

HG-Bitmap Join Index: A Hybrid GPU/CPU Bitmap Join Index Mechanism for OLAP

Integration of FPGAs in Database Management Systems: Challenges and Opportunities

Out-of-the-box library support for DBMS operations on GPUs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

One size does not fit all: accelerating OLAP workloads with GPUs

Abstract

Access this article

Similar content being viewed by others

HG-Bitmap Join Index: A Hybrid GPU/CPU Bitmap Join Index Mechanism for OLAP

Integration of FPGAs in Database Management Systems: Challenges and Opportunities

Out-of-the-box library support for DBMS operations on GPUs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation