当前位置: X-MOL 学术J. R. Stat. Soc. A › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Using text mining to track outbreak trends in global surveillance of emerging diseases: ProMED-mail
The Journal of the Royal Statistical Society, Series A (Statistics in Society) ( IF 1.5 ) Pub Date : 2021-07-01 , DOI: 10.1111/rssa.12721
Jingxian You 1 , Paul Expert 1 , Céire Costelloe 1
Affiliation  

ProMED-mail (Program for Monitoring Emerging Disease) is an international disease outbreak monitoring and early warning system. Every year, users contribute thousands of reports that include reference to infectious diseases and toxins. However, due to the uneven distribution of the reports for each disease, traditional statistics-based text mining techniques, represented by term frequency-related algorithm, are not suitable. Thus, we conducted a study in three steps (i) report filtering, (ii) keyword extraction from reports and finally (iii) word co-occurrence network analysis to fill the gap between ProMED and its utilization. The keyword extraction was performed with the TextRank algorithm, keywords co-occurrence networks were then produced using the top keywords from each document and multiple network centrality measures were computed to analyse the co-occurrence networks. We used two major outbreaks in recent years, Ebola, 2014 and Zika 2015, as cases to illustrate and validate the process. We found that the extracted information structures are consistent with World Health Organisation description of the timeline and phases of the epidemics. Our research presents a pipeline that can extract and organize the information to characterize the evolution of epidemic outbreaks. It also highlights the potential for ProMED to be utilized in monitoring, evaluating and improving responses to outbreaks.

中文翻译:

使用文本挖掘来跟踪全球新兴疾病监测的爆发趋势:ProMED-mail

ProMED-mail(新发疾病监测计划)是一个国际疾病暴发监测和预警系统。每年,用户都会贡献数以千计的报告,其中包括传染病和毒素的参考。然而,由于每种疾病的报告分布不均,以词频相关算法为代表的传统基于统计的文本挖掘技术并不适用。因此,我们分三个步骤进行了研究 (i) 报告过滤,(ii) 从报告中提取关键字,最后 (iii) 单词共现网络分析,以填补 ProMED 与其利用之间的差距。使用TextRank算法进行关键字提取,然后使用来自每个文档的顶级关键字生成关键字共现网络,并计算多个网络中心性度量以分析共现网络。我们使用近年来的两次主要爆发,即 2014 年埃博拉病毒和 2015 年寨卡病毒作为案例来说明和验证该过程。我们发现提取的信息结构与世界卫生组织对流行病的时间线和阶段的描述一致。我们的研究提出了一个管道,可以提取和组织信息来表征流行病爆发的演变。它还强调了 ProMED 在监测、评估和改进对疫情的反应方面的潜力。作为案例来说明和验证过程。我们发现提取的信息结构与世界卫生组织对流行病的时间线和阶段的描述一致。我们的研究提出了一个管道,可以提取和组织信息来表征流行病爆发的演变。它还强调了 ProMED 在监测、评估和改进对疫情的反应方面的潜力。作为案例来说明和验证过程。我们发现提取的信息结构与世界卫生组织对流行病的时间线和阶段的描述一致。我们的研究提出了一个管道,可以提取和组织信息来表征流行病爆发的演变。它还强调了 ProMED 在监测、评估和改进对疫情的反应方面的潜力。
更新日期:2021-07-01
down
wechat
bug