当前位置: X-MOL 学术Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Towards multi-purpose main-memory storage structures: Exploiting sub-space distance equalities in totally ordered data sets for exact knn queries
Information Systems ( IF 3.7 ) Pub Date : 2021-05-12 , DOI: 10.1016/j.is.2021.101791
Martin Schäler , Christine Tex , Veit Köppen , David Broneske , Gunter Saake

Efficient knn computation for high-dimensional data is an important, yet challenging task. Today, most information systems use a column-store back-end for relational data. For such systems, multi-dimensional indexes accelerating selections are known. However, they cannot be used to accelerate knn queries. Consequently, one relies on sequential scans, specialized knn indexes, or trades result quality for speed. To avoid storing one specialized index per query type, we envision multipurpose indexes allowing to efficiently compute multiple query types. In this paper, we focus on additionally supporting knn queries as first step towards this goal. To this end, we study how to exploit total orders for accelerating knn queries based on the sub-space distance equalities observation. It means that non-equal points in the full space, which are projected to the same point in a sub space, have the same distance to every other point in this sub space. In case one can easily find these equalities and tune storage structures towards them, this offers two effects one can exploit to accelerate knn queries. The first effect allows pruning of point groups based on a cascade of lower bounds. The second allows to re-use previously computed sub-space distances between point groups. This results in a worst-case execution bound, which is independent of the distance function. We present knn algorithms exploiting both effects and show how to tune a storage structure already known to work well for multi-dimensional selections. Our investigations reveal that the effects are robust to increasing, e.g., the dimensionality, suggesting generally good knn performance. Comparing our knn algorithms to well-known competitors reveals large performance improvements up to one order of magnitude. Furthermore, the algorithms deliver at least comparable performance as the next fastest competitor suggesting that the algorithms are only marginally affected by the curse of dimensionality.



中文翻译:

迈向多功能主内存存储结构:在完全有序的数据集中利用子空间距离相等性进行精确的knn查询

高维数据的有效knn计算是一项重要但具有挑战性的任务。如今,大多数信息系统都使用列存储后端来存储关系数据。对于这样的系统,加速选择的多维索引是已知的。但是,它们不能用于加速knn查询。因此,人们依靠顺序扫描,专门的knn索引,或以结果质量来换取速度。为了避免为每种查询类型存储一个专门的索引,我们设想了多用途索引,可以有效地计算多种查询类型。在本文中,我们专注于额外支持knn查询,这是朝着这一目标迈出的第一步。为此,我们研究如何基于子空间距离等式利用总订单来加速knn查询观察。这意味着投影到子空间中同一点的整个空间中的不等点到该子空间中的每个其他点的距离相同。万一可以轻松找到这些相等性并调整它们的存储结构,则可以提供两种效果,一个可以用来加速knn查询。第一个效果允许基于下限的级联修剪点组。第二个允许重复使用先前计算的点组之间的子空间距离。这导致最坏情况的执行范围,该范围与距离函数无关。我们介绍了利用这两种效果的knn算法,并展示了如何调整已知用于多维选择的存储结构。我们的研究表明,这种效果对于增加尺寸(例如,尺寸,表明knn表现总体良好。将我们的knn算法与知名竞争对手进行比较,可以发现性能提高了一个数量级。此外,该算法至少可提供与下一个最快的竞争对手可比的性能,这表明该算法仅受维数诅咒的影响很小。

更新日期:2021-05-25
down
wechat
bug