Big high-dimension data cube designs for hybrid memory systems,Knowledge and Information Systems

当前位置： X-MOL 学术 › Knowl. Inf. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Big high-dimension data cube designs for hybrid memory systems
Knowledge and Information Systems ( IF 2.5 ) Pub Date : 2020-08-26 , DOI: 10.1007/s10115-020-01505-9
Rodrigo Rocha Silva , Celso Massaki Hirata , Joubert de Castro Lima

In Big Data cubes with hundreds of dimensions and billions of tuples, the indexing and query operations are a challenge and the reason is the time-space exponential complexity when a full cube is computed. Therefore, solutions based on RAM may not be practical and the solutions based on hybrid memory (RAM and disk) become viable alternatives. In this paper, we propose a hybrid approach, named bCubing, to index and query high-dimension data cubes with high number of tuples in a single machine and using RAM and disk memory systems. We evaluated bCubing in terms of runtime and memory consumption, comparing it with the Frag-Cubing, HIC and H-Frag approaches. bCubing showed to be faster and used less RAM than Frag-Cubing, HIC and H-Frag. bCubing indexed and allowed to query a data cube with 1.2 billion tuples and 60 dimensions, consuming only 84 GB of RAM, which means 35% less memory than HIC. The complex holistic measures mode and median were computed in multidimensional queries, and bCubing was, on average, 50% faster than HIC.

中文翻译：

用于混合存储系统的大型高维数据立方体设计

在具有数百个维和数十亿个元组的大数据多维数据集中，索引和查询操作是一个挑战，原因是计算完整多维数据集时的时空指数复杂性。因此，基于RAM的解决方案可能不切实际，并且基于混合内存（RAM和磁盘）的解决方案成为可行的替代方案。在本文中，我们提出了一种名为bCubing的混合方法，用于在一台机器上使用RAM和磁盘存储系统来索引和查询具有大量元组的高维数据多维数据集。我们在运行时和内存消耗方面评估了bCubing，并将其与Frag-Cubing，HIC和H-Frag方法进行了比较。与Frag-Cubing，HIC和H-Frag相比，bCubing具有更快的速度和更少的RAM。bCubing建立索引并允许查询具有12亿元组和60个维度的数据多维数据集，仅消耗84 GB的RAM，这意味着比HIC少35％的内存。复杂的整体测度模式和中位数是在多维查询中计算的，bCubing平均比HIC快50％。

更新日期：2020-08-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11