当前位置: X-MOL 学术J. Comput. Sci. Tech. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CirroData: Yet Another SQL-on-Hadoop Data Analytics Engine with High Performance
Journal of Computer Science and Technology ( IF 1.2 ) Pub Date : 2020-01-01 , DOI: 10.1007/s11390-020-9536-z
Zheng-Hao Jin , Haiyang Shi , Ying-Xin Hu , Li Zha , Xiaoyi Lu

This paper presents CirroData, a high-performance SQL-on-Hadoop system designed for Big Data analytics workloads. As a home-grown enterprise-level online analytical processing (OLAP) system with more than seven-year research and development (R&D) experiences, we share our design details to the community about how to achieve high performance in CirroData. Multiple optimization techniques have been discussed in the paper. The effectiveness and the efficiency of all these techniques have been proved by our customers’ daily usage. Benchmark-level studies, as well as several real application case studies of CirroData, have been presented in this paper. Our evaluations show that CirroData can outperform various types of counterpart database systems in the community, such as “Spark+Hive”, “Spark+HBase”, Impala, DB-X/Y, Greenplum, HAWQ, and others. CirroData can achieve up to 4.99x speedup compared with Greenplum, HAWQ, and Spark in the standard TPC-H queries. Application-level evaluations demonstrate that CirroData outperforms “Spark+Hive” and “Spark+HBase” by up to 8.4x and 38.8x, respectively. In the meantime, CirroData achieves the performance speedups for some application workloads by up to 20x, 100x, 182.5x, 92.6x, and 55.5x as compared with Greenplum, DB-X, Impala, DB-Y, and HAWQ, respectively.

中文翻译:

CirroData:又一个具有高性能的 SQL-on-Hadoop 数据分析引擎

本文介绍了 CirroData,这是一种为大数据分析工作负载设计的高性能 SQL-on-Hadoop 系统。作为拥有七年以上研发经验的本土企业级在线分析处理(OLAP)系统,我们将我们的设计细节分享给社区,了解如何在 CirroData 中实现高性能。论文中讨论了多种优化技术。我们客户的日常使用证明了所有这些技术的有效性和效率。本文介绍了基准级研究以及 CirroData 的几个实际应用案例研究。我们的评估表明,CirroData 可以胜过社区中各种类型的对应数据库系统,例如“Spark+Hive”、“Spark+HBase”、Impala、DB-X/Y、Greenplum、HAWQ 等。在标准 TPC-H 查询中,与 Greenplum、HAWQ 和 Spark 相比,CirroData 可以实现高达 4.99 倍的加速。应用程序级评估表明,CirroData 的性能分别比“Spark+Hive”和“Spark+HBase”高出 8.4 倍和 38.8 倍。同时,与Greenplum、DB-X、Impala、DB-Y和HAWQ相比,CirroData在某些应用程序工作负载上的性能加速分别高达20倍、100倍、182.5倍、92.6倍和55.5倍。
更新日期:2020-01-01
down
wechat
bug