当前位置: X-MOL 学术Big Data Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Efficient Algorithm for Spatio-Textual Object Cluster Join
Big Data Research ( IF 3.5 ) Pub Date : 2021-02-05 , DOI: 10.1016/j.bdr.2021.100191
Mingming Chen , Ning Wang , Daxin Zhu , Jedi S. Shang

With the proliferation of GPS-based equipments and location-based services, spatio-textual objects have been playing an indispensable role in spatial data management. It is of great importance to enable the join operation among spatio-textual object groups. In this paper, we propose to study a novel problem of spatio-textual object cluster join (STOC-Join). Given two sets of spatio-textual objects D1 and D2 and a similarity threshold θ, the STOC-Join problem finds all object cluster pairs whose spatio-textual similarities are no less than θ. The problem of STOC-Join is practical in a variety of application scenarios, including location-based event detection, location-based data cleaning, and location-based social media data pre-processing in general. Efficient processing of STOC-Join is challenging in the following three aspects: (1) How to define and compute the spatio-textual similarity between two clusters of spatio-textual objects effectively; (2) How to efficiently cluster a large number of spatio-textual objects; (3) How to efficiently find similar cluster pairs and filter out unqualified pair candidates. To address the challenges, we define an effective and easy-to-compute similarity metric that measures the aggregated similarities between two groups of spatio-textual objects. Based on the similarity metric, we propose a novel two-phase matching algorithm that is able to cluster a large number of spatio-textual objects and find all cluster pairs efficiently. Our experiments on large real-life datasets confirm that our proposed two-phase matching algorithm is capable of achieving high efficiency compared with straightforward methods.



中文翻译:

时空-文本对象簇连接的高效算法

随着基于GPS的设备和基于位置的服务的普及,时空文本对象在空间数据管理中起着不可或缺的作用。启用时空文本对象组之间的联接操作非常重要。在本文中,我们建议研究一个时空-文本对象簇连接(STOC-Join)的新问题。给定两组时空文本对象d1个d2和相似度阈值θ,STOC-Join问题找到时空文本相似度不小于θ的所有对象簇对。通常,STOC-Join的问题在各种应用场景中都是可行的,包括基于位置的事件检测,基于位置的数据清理以及基于位置的社交媒体数据预处理。STOC-Join的有效处理在以下三个方面具有挑战性:(1)如何有效地定义和计算两个时空文本对象簇之间的时空文本相似性;(2)如何有效地将大量时空文本对象聚类;(3)如何有效地找到相似的聚类对,并筛选出不合格的候选对。为了解决这些挑战,我们定义了一种有效且易于计算的相似度度量标准,用于衡量两组时空文本对象之间的汇总相似度。根据相似度指标,我们提出了一种新颖的两阶段匹配算法,该算法能够对大量时空文本对象进行聚类并有效地找到所有聚类对。我们在大型现实数据集上的实验证实,与简单方法相比,我们提出的两阶段匹配算法能够实现较高的效率。

更新日期:2021-02-08
down
wechat
bug