DimensionSlice: A main-memory data layout for fast scans of multidimensional data,Information Systems

当前位置： X-MOL 学术 › Inform. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DimensionSlice: A main-memory data layout for fast scans of multidimensional data
Information Systems ( IF 3.0 ) Pub Date : 2020-07-25 , DOI: 10.1016/j.is.2020.101602
Ilhyun Suh , Yon Dohn Chung

Multidimensional data are exploited in many application areas such as scientific data analysis, business intelligence, and geographic information systems. One of the most frequent operations applied to such multidimensional data is the selection of a subspace of the given multidimensional space, which involves predicate evaluation on multiple dimensions. Existing main-memory data layouts optimized for evaluating predicates on the columnar data can be used to accelerate the subspace extraction by sequentially performing filter scans on each dimension one at a time. However, optimization opportunities emerge if we can consider all predicates together. In this paper, we propose DimensionSlice, a new main-memory data layout optimized for evaluating predicates on multiple dimensions. More specifically, the dimension values are sliced into portions and the portions with the same order of each dimension are arranged together. Multiple predicates are simultaneously evaluated with the sliced dimension values during the scan. In addition, by storing the different portions separately, unnecessary loads and computations of lower portions can be eliminated if the evaluation results are assured after examining the upper portions. For further acceleration of scans, the DimensionSlice layout is designed to easily leverage the SIMD capabilities that most mainstream processors are equipped with. Through experiments, we demonstrate the performance gains of the proposed method over the columnar main-memory layout that evaluates the partial predicates one dimension at a time. We also show that the proposed method outperforms the state-of-the-art multidimensional index structure when the selectivity is over a very low threshold.

中文翻译：

DimensionSlice：用于快速扫描多维数据的主内存数据布局

多维数据被用于许多应用领域，例如科学数据分析，商业智能和地理信息系统。应用于此类多维数据的最常见操作之一是选择给定多维空间的子空间，这涉及对多维进行谓词评估。通过对每个维一次依次执行过滤器扫描，可以优化现有的优化用于评估列数据上的谓词的主内存数据布局，以加快子空间提取的速度。但是，如果我们将所有谓词放在一起考虑，就会出现优化机会。在本文中，我们提出了DimensionSlice，这是一种新的主内存数据布局，该布局经过优化，可以评估多维维度上的谓词。更具体地，将尺寸值切成多个部分，并且将每个尺寸的相同顺序的部分布置在一起。在扫描期间，同时使用切片的维值评估多个谓词。另外，通过分开存储不同的部分，如果在检查上部之后确保评估结果，则可以消除不必要的负荷和下部的计算。为了进一步加快扫描速度，DimensionSlice布局旨在轻松利用大多数主流处理器配备的SIMD功能。通过实验，我们证明了该方法在柱状主内存布局上的性能提升，该布局一次评估了局部谓词的一维。我们还表明，该方法优于国家的最先进的多维索引结构时的选择性是在一个非常低的门槛。

更新日期：2020-07-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11