当前位置: X-MOL 学术Neural Comput. & Applic. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient locality-sensitive hashing over high-dimensional streaming data
Neural Computing and Applications ( IF 6 ) Pub Date : 2020-09-17 , DOI: 10.1007/s00521-020-05336-1
Hao Wang , Chengcheng Yang , Xiangliang Zhang , Xin Gao

Approximate nearest neighbor (ANN) search in high-dimensional spaces is fundamental in many applications. Locality-sensitive hashing (LSH) is a well-known methodology to solve the ANN problem. Existing LSH-based ANN solutions typically employ a large number of individual indexes optimized for searching efficiency. Updating such indexes might be impractical when processing high-dimensional streaming data. In this paper, we present a novel disk-based LSH index that offers efficient support for both searches and updates. The contributions of our work are threefold. First, we use the write-friendly LSM-trees to store the LSH projections to facilitate efficient updates. Second, we develop a novel estimation scheme to estimate the number of required LSH functions, with which the disk storage and access costs are effectively reduced. Third, we exploit both the collision number and the projection distance to improve the efficiency of candidate selection, improving the search performance with theoretical guarantees on the result quality. Experiments on four real-world datasets show that our proposal outperforms the state-of-the-art schemes.



中文翻译:

对高维度流数据进行有效的局部敏感哈希

高维空间中的近似最近邻(ANN)搜索在许多应用中至关重要。局部敏感哈希(LSH)是解决ANN问题的众所周知的方法。现有的基于LSH的ANN解决方案通常采用大量针对搜索效率进行了优化的单个索引。处理高维流数据时,更新此类索引可能不切实际。在本文中,我们提出了一种新颖的基于磁盘的LSH索引,该索引为搜索和更新提供了有效的支持。我们的工作有三方面的贡献。首先,我们使用写友好的LSM树来存储LSH投影,以促进有效的更新。其次,我们开发了一种新颖的估计方案来估计所需的LSH功能的数量,从而有效减少了磁盘的存储和访问成本。第三,我们利用碰撞次数和投影距离来提高候选者选择的效率,在为结果质量提供理论保证的情况下提高搜索性能。在四个真实世界的数据集上进行的实验表明,我们的建议优于最新方案。

更新日期:2020-09-18
down
wechat
bug