当前位置: X-MOL 学术Concurr. Comput. Pract. Exp. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
High‐performance implementation of a two‐bit geohash coding technique for nearest neighbor search
Concurrency and Computation: Practice and Experience ( IF 1.5 ) Pub Date : 2020-10-05 , DOI: 10.1002/cpe.6029
Varalakshmi M 1 , Amit P. Kesarkar 2 , Daphne Lopez 1
Affiliation  

Insights from geohash coding algorithms introduce significant opportunities for various spatial applications. However, these algorithms require massive storage, complex bit manipulation, and extensive code modification when scaled to higher dimensions. In this article, we have developed a two‐bit geohash coding algorithm that divides the search space into four equal partitions where each partition is assigned a two‐bit label as 00, 01, 10, and 11, which helps to uniquely identify a chosen data point and the two neighbors on its either side, taken along a particular dimension. This salient feature of the algorithm simplifies the generation of geohash code for the neighboring grid cells. In addition, it achieves efficient memory utilization by storing the geohash values of the training points as integers. Demonstrated by experiments for climate data assimilation, model‐to‐observation space mapping with a geohash code length of 24 bits for Lat‐Lon extent of India has shown favorable results with an accuracy of 85%. Performance and scalability evaluation of the proposed algorithm, optimized for multicore and many‐core processors has shown significant speedups outperforming a tree‐based approach. This algorithm provides a foundation for new spatial statistical methods that can be used for pattern discovery and detection in spatial big data.

中文翻译:

用于最近邻居搜索的两位Geohash编码技术的高性能实现

geohash编码算法的见识为各种空间应用带来了巨大的机遇。但是,这些算法在扩展到更高维度时需要大量存储,复杂的位操作和大量的代码修改。在本文中,我们开发了一种2位地理哈希编码算法,该算法将搜索空间划分为4个相等的分区,每个分区分配有2位标签,如00、01、10和11,这有助于唯一地标识选定的数据点及其两侧的两个相邻点,沿着特定尺寸截取。该算法的这一显着特征简化了相邻网格单元的geohash代码的生成。此外,它通过将训练点的geohash值存储为整数来实现有效的内存利用。通过对气候数据同化的实验证明,印度拉特朗范围的24位地理哈希码长度的模型到观测空间的映射显示了良好的结果,准确度为85%。针对多核和多核处理器进行了优化的拟议算法的性能和可伸缩性评估显示,其显着的提速性能优于基于树的方法。该算法为可用于空间大数据中模式发现和检测的新空间统计方法提供了基础。针对多核和多核处理器进行的优化显示出显着的提速性能优于基于树的方法。该算法为可用于空间大数据中模式发现和检测的新空间统计方法提供了基础。针对多核和多核处理器进行的优化显示出显着的提速性能优于基于树的方法。该算法为可用于空间大数据中模式发现和检测的新空间统计方法提供了基础。
更新日期:2020-10-05
down
wechat
bug