当前位置: X-MOL 学术VLDB J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
EI-LSH: An early-termination driven I/O efficient incremental c -approximate nearest neighbor search
The VLDB Journal ( IF 4.2 ) Pub Date : 2020-09-30 , DOI: 10.1007/s00778-020-00635-4
Wanqi Liu , Hanchen Wang , Ying Zhang , Wei Wang , Lu Qin , Xuemin Lin

Nearest neighbor in high-dimensional space has been widely used in various fields such as databases, data mining and machine learning. The problem has been well solved in low-dimensional space. However, when it comes to high-dimensional space, due to the curse of dimensionality, the problem is challenging. As a trade-off between accuracy and efficiency, c-approximate nearest neighbor (c-ANN) is considered instead of an exact NN search in high-dimensional space. A variety of c-ANN algorithms have been proposed, one of the important schemes for the c-ANN problem is called Locality-sensitive hashing (LSH), which projects a high-dimensional dataset into a low-dimensional dataset and can return a c-ANN with a constant probability. In this paper, we propose a new aggressive early-termination (ET) condition which stops the algorithm with LSH scheme earlier under the same theoretical guarantee, leading to a smaller I/O cost and less running time. Unlike the “conservative” early termination conditions used in previous studies, we propose an “aggressive” early termination condition which can stop much earlier. Though it is not absolutely safe and may result in the probability of failure, we can still devise more efficient algorithms under the same theoretical guarantee by carefully considering the failure probabilities brought by LSH scheme and early termination. Furthermore, we also introduce an incremental searching strategy. Unlike the previous LSH methods, which expand the bucket width in an exponential way, we employ a more natural search strategy to incrementally access the hash values of the objects. We also provide a rigorous theoretical analysis to underpin our incremental search strategy and the new early termination technique. Our comprehensive experiment results show that, compared with the state-of-the-art I/O efficient c-ANN techniques, our proposed algorithm, namely EI-LSH, can achieve much better I/O efficiency under the same theoretical guarantee.



中文翻译:

EI-LSH:提前终止驱动的I / O高效增量c-近似最近邻居搜索

高维空间中的最近邻居已广泛用于数据库,数据挖掘和机器学习等各个领域。在低维空间中,该问题已得到很好的解决。但是,当涉及到高维空间时,由于维数的诅咒,这个问题具有挑战性。由于精确度和效率之间的折衷Ç -approximate最近邻(Ç -ann)被认为是代替精确NN在高维空间中进行搜索。已经提出了多种c -ANN算法,解决c -ANN问题的重要方案之一是局部敏感哈希(LSH),它将高维数据集投影到低维数据集并可以返回c-ANN具有恒定的概率。在本文中,我们提出了一种新的主​​动提前终止(ET)条件,该条件可以在相同的理论保证下更早地停止使用LSH方案的算法,从而降低I / O成本并缩短运行时间。与先前研究中使用的“保守”提前终止条件不同,我们提出了一种“激进”提前终止条件,该条件可以提前终止。尽管它不是绝对安全的,并且可能导致失败的可能性,但我们仍然可以通过仔细考虑LSH方案和提前终止带来的失败概率,在相同的理论保证下设计出更高效的算法。此外,我们还介绍了增量搜索策略。与以前的LSH方法不同,后者以指数方式扩展了桶的宽度,我们采用一种更自然的搜索策略来增量访问对象的哈希值。我们还提供了严格的理论分析,以支持我们的增量搜索策略和新的提前终止技术。我们全面的实验结果表明,与最新的I / O效率相比c- ANN技术,我们提出的算法EI-LSH在相同的理论保证下可以实现更好的I / O效率。

更新日期:2020-10-02
down
wechat
bug