当前位置: X-MOL 学术Big Data Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Efficient Algorithm for Spatio-Textual Object Cluster Join
Big Data Research ( IF 3.5 ) Pub Date : 2021-02-05 , DOI: 10.1016/j.bdr.2021.100191
Mingming Chen , Ning Wang , Daxin Zhu , Jedi S. Shang

With the proliferation of GPS-based equipments and location-based services, spatio-textual objects have been playing an indispensable role in spatial data management. It is of great importance to enable the join operation among spatio-textual object groups. In this paper, we propose to study a novel problem of spatio-textual object cluster join (STOC-Join). Given two sets of spatio-textual objects D1 and D2 and a similarity threshold θ, the STOC-Join problem finds all object cluster pairs whose spatio-textual similarities are no less than θ. The problem of STOC-Join is practical in a variety of application scenarios, including location-based event detection, location-based data cleaning, and location-based social media data pre-processing in general. Efficient processing of STOC-Join is challenging in the following three aspects: (1) How to define and compute the spatio-textual similarity between two clusters of spatio-textual objects effectively; (2) How to efficiently cluster a large number of spatio-textual objects; (3) How to efficiently find similar cluster pairs and filter out unqualified pair candidates. To address the challenges, we define an effective and easy-to-compute similarity metric that measures the aggregated similarities between two groups of spatio-textual objects. Based on the similarity metric, we propose a novel two-phase matching algorithm that is able to cluster a large number of spatio-textual objects and find all cluster pairs efficiently. Our experiments on large real-life datasets confirm that our proposed two-phase matching algorithm is capable of achieving high efficiency compared with straightforward methods.



