当前位置: X-MOL 学术Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bitpart: Exact metric search in high(er) dimensions
Information Systems ( IF 3.0 ) Pub Date : 2020-02-04 , DOI: 10.1016/j.is.2020.101493
Alan Dearle , Richard Connor

We define BitPart (Bitwise representations of binary Partitions), a novel exact search mechanism intended for use in high-dimensional spaces. In outline, a fixed set of reference objects is used to define a large set of regions within the original space, and each data item is characterised according to its containment within these regions. In contrast with other mechanisms only a subset of this information is selected, according to the query, before a search within the re-cast space is performed. Partial data representations are accessed only if they are known to be potentially useful towards the calculation of the exact query solution.

Our mechanism requires Ω(NlogN) space to evaluate a query, where N is the cardinality of the data, and therefore does not scale as well as previously defined mechanisms with low-dimensional data. However it has recently been shown that, for a nearest neighbour search in high dimensions, a sequential scan of the data is essentially unavoidable. This result has been suspected for a long time, and has been referred to as the curse of dimensionality in this context.

In the light of this result, the compromise achieved by this work is to make the best possible use of the available fast memory, and to offer great potential for parallel query evaluation. To our knowledge, it gives the best compromise currently known for performing exact search over data whose dimensionality is too high to allow the useful application of metric indexing, yet is still sufficiently low to give at least some traction from the metric and supermetric properties.



中文翻译:

位部分:高尺寸的精确度量搜索

我们定义了BitPart(二进制Part Itions的按表示),这是一种新颖的精确搜索机制,旨在用于高维空间。概述中,使用固定的一组参考对象来定义原始空间内的一大组区域,并且每个数据项都根据其在这些区域内的包含来表征。与其他机制相比,根据查询,仅在重播空间内执行搜索之前选择此信息的子集。仅在已知部分数据表示对精确查询解决方案的计算可能有用的情况下,才访问部分数据表示。

我们的机制要求 Ωñ日志ñ 评估查询的空间,其中 ñ是数据的基数,因此无法像以前定义的使用低维数据的机制那样扩展。然而,最近已经显示出,对于高维度的最近邻居搜索,数据的顺序扫描基本上是不可避免的。长期以来一直怀疑该结果,并且在这种情况下被称为维数诅咒

根据此结果,这项工作取得的妥协是最大程度地利用了可用的快速存储器,并为并行查询评估提供了巨大的潜力。据我们所知,它提供了当前最好的折衷方案,即对维度过高而无法使用度量索引的数据进行精确搜索,但仍然足够低,无法从度量和超度量属性中获得至少一定的吸引力。

更新日期:2020-04-21
down
wechat
bug