A topic modeling framework for spatio-temporal information management.,Information Processing & Management

当前位置： X-MOL 学术 › Inf. Process. Manag. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A topic modeling framework for spatio-temporal information management.
Information Processing & Management ( IF 7.4 ) Pub Date : 2020-07-06 , DOI: 10.1016/j.ipm.2020.102340
Mohsen Asghari ₁ , Daniel Sierra-Sosa ₁ , Adel S Elmaghraby ₁

Affiliation

Real-time processing and learning of conflicting data, especially messages coming from different ideas, locations, and time, in a dynamic environment such as Twitter is a challenging task that recently gained lots of attention. This paper introduces a framework for managing, processing, analyzing, detecting, and tracking topics in streaming data. We propose a model selector procedure with a hybrid indicator to tackle the challenge of online topic detection. In this framework, we built an automatic data processing pipeline with two levels of cleaning. Regular and deep cleaning are applied using multiple sources of meta knowledge to enhance data quality. Deep learning and transfer learning techniques are used to classify health-related tweets, with high accuracy and improved F1-Score. In this system, we used visualization to have a better understanding of trending topics. To demonstrate the validity of this framework, we implemented and applied it to health-related twitter data from users originating in the USA over nine months. The results of this implementation show that this framework was able to detect and track the topics at a level comparable to manual annotation. To better explain the emerging and changing topics in various locations over time the result is graphically displayed on top of the United States map.

中文翻译：

时空信息管理的主题建模框架。

在动态环境（例如Twitter）中实时处理和学习冲突数据，尤其是来自不同想法，位置和时间的消息，是一项艰巨的任务，最近引起了广泛关注。本文介绍了一种用于管理，处理，分析，检测和跟踪流数据中的主题的框架。我们提出了一种带有混合指标的模型选择器程序，以应对在线主题检测的挑战。在此框架中，我们建立了具有两个清除级别的自动数据处理管道。使用元数据的多种来源进行常规和深度清理，以提高数据质量。深度学习和迁移学习技术用于对与健康相关的推文进行分类，具有较高的准确性和改进的F1-Score。在这个系统中我们使用可视化来更好地了解趋势主题。为了证明此框架的有效性，我们在9个月内将其实施并将其应用于来自美国的与健康相关的Twitter数据。该实施的结果表明，该框架能够以与手动注释相当的水平检测和跟踪主题。为了更好地说明随着时间的推移在各个位置出现的新兴主题和不断变化的主题，结果以图形方式显示在美国地图的顶部。

更新日期：2020-07-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11