Information Systems ( IF 2.466 ) Pub Date : 2020-07-25 , DOI: 10.1016/j.is.2020.101602 Ilhyun Suh; Yon Dohn Chung
Multidimensional data are exploited in many application areas such as scientific data analysis, business intelligence, and geographic information systems. One of the most frequent operations applied to such multidimensional data is the selection of a subspace of the given multidimensional space, which involves predicate evaluation on multiple dimensions. Existing main-memory data layouts optimized for evaluating predicates on the columnar data can be used to accelerate the subspace extraction by sequentially performing filter scans on each dimension one at a time. However, optimization opportunities emerge if we can consider all predicates together. In this paper, we propose DimensionSlice, a new main-memory data layout optimized for evaluating predicates on multiple dimensions. More specifically, the dimension values are sliced into portions and the portions with the same order of each dimension are arranged together. Multiple predicates are simultaneously evaluated with the sliced dimension values during the scan. In addition, by storing the different portions separately, unnecessary loads and computations of lower portions can be eliminated if the evaluation results are assured after examining the upper portions. For further acceleration of scans, the DimensionSlice layout is designed to easily leverage the SIMD capabilities that most mainstream processors are equipped with. Through experiments, we demonstrate the performance gains of the proposed method over the columnar main-memory layout that evaluates the partial predicates one dimension at a time. We also show that the proposed method outperforms the state-of-the-art multidimensional index structure when the selectivity is over a very low threshold.