当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Real-time spatio-temporal event detection on geotagged social media
Journal of Big Data ( IF 8.6 ) Pub Date : 2021-06-24 , DOI: 10.1186/s40537-021-00482-2
Yasmeen George , Shanika Karunasekera , Aaron Harwood , Kwan Hui Lim

A key challenge in mining social media data streams is to identify events which are actively discussed by a group of people in a specific local or global area. Such events are useful for early warning for accident, protest, election or breaking news. However, neither the list of events nor the resolution of both event time and space is fixed or known beforehand. In this work, we propose an online spatio-temporal event detection system using social media that is able to detect events at different time and space resolutions. First, to address the challenge related to the unknown spatial resolution of events, a quad-tree method is exploited in order to split the geographical space into multiscale regions based on the density of social media data. Then, a statistical unsupervised approach is performed that involves Poisson distribution and a smoothing method for highlighting regions with unexpected density of social posts. Further, event duration is precisely estimated by merging events happening in the same region at consecutive time intervals. A post processing stage is introduced to filter out events that are spam, fake or wrong. Finally, we incorporate simple semantics by using social media entities to assess the integrity, and accuracy of detected events. The proposed method is evaluated using different social media datasets: Twitter and Flickr for different cities: Melbourne, London, Paris and New York. To verify the effectiveness of the proposed method, we compare our results with two baseline algorithms based on fixed split of geographical space and clustering method. For performance evaluation, we manually compute recall and precision. We also propose a new quality measure named strength index, which automatically measures how accurate the reported event is.



中文翻译:

地理标记社交媒体上的实时时空事件检测

挖掘社交媒体数据流的一个关键挑战是识别特定本地或全球区域的一群人积极讨论的事件。此类事件对于事故、抗议、选举或突发新闻的预警非常有用。然而,事件列表和事件时间和空间的分辨率都不是固定的或事先已知的。在这项工作中,我们提出了一种使用社交媒体的在线时空事件检测系统,该系统能够检测不同时间和空间分辨率的事件。首先,为了解决与事件的未知空间分辨率相关的挑战,利用四叉树方法根据社交媒体数据的密度将地理空间划分为多尺度区域。然后,执行了一种统计无监督方法,该方法涉及泊松分布和用于突出显示具有意外社交帖子密度的区域的平滑方法。此外,通过合并同一区域以连续时间间隔发生的事件来精确估计事件持续时间。引入了后处理阶段来过滤掉垃圾邮件、虚假或错误的事件。最后,我们通过使用社交媒体实体来整合简单的语义来评估检测到的事件的完整性和准确性。所提出的方法使用不同的社交媒体数据集进行评估:不同城市的 Twitter 和 Flickr:墨尔本、伦敦、巴黎和纽约。为了验证所提出方法的有效性,我们将我们的结果与基于地理空间固定分割和聚类方法的两种基线算法进行了比较。对于性能评估,我们手动计算召回率和准确率。我们还提出了一种名为强度指数的新质量度量,它可以自动衡量报告事件的准确程度。

更新日期:2021-06-24
down
wechat
bug