当前位置: X-MOL 学术Distrib. Parallel. Databases › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An efficient algorithm for spatio-textual location matching
Distributed and Parallel Databases ( IF 1.5 ) Pub Date : 2020-04-30 , DOI: 10.1007/s10619-020-07289-9
Ning Wang , Jianping Zeng , Mingming Chen , Shunzhi Zhu

Geospatial location matching plays a significant role in spatial databases. In this paper, we propose and study a novel parallel spatio-textual location matching (STLM) query. Given two sets P and Q of spatial locations with textual attributes, a spatio-textual matching threshold $$\theta $$ θ , the STLM query finds all location pairs whose spatio-textual similarity exceeds $$\theta $$ θ . We believe that the STLM query is useful in many applications such as important location/hot region detection, duplicate spatio-textual data cleaning, and location based services in general. The STLM query is challenging due to three reasons: (1) how to evaluate the spatio-textual similarity between two locations practically, (2) how to prune the search space effectively in both spatial and textual domains, and (3) how to process the STLM query in parallel because of its high computation complexity. To overcome the challenges, we develop a novel direct matching (DM) search algorithm. A linear combination method is adopted to combine the spatial proximity and textual similarity together. To further improve the query efficiency, we develop a grid-based expansion scheduling scheme based on a purposeful grid index structure. We conduct extensive experiments on real and synthetic spatio-textual data sets to verify the performance of the developed algorithms.

中文翻译:

一种有效的空间文本位置匹配算法

地理空间位置匹配在空间数据库中起着重要作用。在本文中,我们提出并研究了一种新颖的并行空间文本位置匹配(STLM)查询。给定具有文本属性的空间位置的两个集合 P 和 Q,空间文本匹配阈值 $$\theta $$ θ ,STLM 查询找到空间文本相似度超过 $$\theta $$ θ 的所有位置对。我们相信 STLM 查询在许多应用程序中很有用,例如重要位置/热点区域检测、重复空间文本数据清理和一般基于位置的服务。由于三个原因,STLM 查询具有挑战性:(1)如何实际评估两个位置之间的空间文本相似性,(2)如何在空间和文本域中有效地修剪搜索空间,(3) STLM查询计算复杂度高,如何并行处理。为了克服这些挑战,我们开发了一种新颖的直接匹配 (DM) 搜索算法。采用线性组合方法将空间邻近度和文本相似度结合在一起。为了进一步提高查询效率,我们开发了一种基于有目的的网格索引结构的基于网格的扩展调度方案。我们对真实和合成的空间文本数据集进行了大量实验,以验证所开发算法的性能。我们基于有目的的网格索引结构开发了基于网格的扩展调度方案。我们对真实和合成的空间文本数据集进行了大量实验,以验证所开发算法的性能。我们基于有目的的网格索引结构开发了基于网格的扩展调度方案。我们对真实和合成的空间文本数据集进行了大量实验,以验证所开发算法的性能。
更新日期:2020-04-30
down
wechat
bug