当前位置: X-MOL 学术Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient top-k recently-frequent term querying over spatio-temporal textual streams
Information Systems ( IF 3.0 ) Pub Date : 2020-12-05 , DOI: 10.1016/j.is.2020.101687
Thu-Lan Dam , Sean Chester , Kjetil Nørvåg , Quang-Huy Duong

Massive amounts of data with spatio-temporal-textual information are being generated due to the proliferation of GPS-equipped mobile devices. Much of this data are social media posts, often used to share and spread personal updates and news. Exploring valuable information from a dynamic collection of social posts is of great interest and has attracted many studies. However, because the size of data is huge, the existing methods mostly work with the time window model where the old data is discarded. In this work, we introduce the task of efficiently discovering the top-k most popular terms within a user specified bounded region over a stream of social posts, where the recent posts are more important than the old ones. To make this feasible, we propose a hybrid index structure and algorithms to efficiently answer such top-k queries. Our index employs a spatial index augmented by top-k time-weighted term lists and a bulk updating technique to support fast digestion of social post streams. Further, these top-k term lists are employed in the aggregation step to produce the final results so that incoming queries can be efficiently processed. An extensive experimental study with a large collection of social posts shows that the proposed methods are capable of both online aggregation and accurate query processing.



中文翻译:

在时空文本流上进行有效的前k个最近常用词查询

由于配备GPS的移动设备的激增,正在生成具有时空文本信息的大量数据。这些数据大部分是社交媒体帖子,通常用于共享和传播个人更新和新闻。从动态收集的社交帖子中探索有价值的信息引起了极大的兴趣,并且吸引了许多研究。但是,由于数据量巨大,因此现有方法大多适用于丢弃旧数据的时间窗口模型。在这项工作中,我们介绍了有效发现顶部k的任务用户在社交帖子流中指定的边界区域内的最流行术语,其中最近的帖子比旧的帖子更重要。为了使之可行,我们提出了一种混合索引结构和算法来有效地回答此类前k个查询。我们的索引采用了空间索引,并以top-ķ时间加权术语列表和批量更新技术,以支持对社交帖子流的快速消化。此外,这些ķ在汇总步骤中使用术语列表来产生最终结果,以便可以有效地处理传入的查询。大量的社交帖子进行的广泛实验研究表明,提出的方法既可以在线汇总又可以进行准确的查询处理。

更新日期:2020-12-10
down
wechat
bug