Retrospective and prospective approaches of coronavirus publications in the last half-century: a Latent Dirichlet allocation analysis,Library Hi Tech

当前位置： X-MOL 学术 › Library Hi Tech › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Retrospective and prospective approaches of coronavirus publications in the last half-century: a Latent Dirichlet allocation analysis
Library Hi Tech ( IF 1.623 ) Pub Date : 2021-04-01 , DOI: 10.1108/lht-09-2020-0216
Farshid Danesh , Meisam Dastani , Mohammad Ghorbani

Purpose

The present article's primary purpose is the topic modeling of the global coronavirus publications in the last 50 years.

Design/methodology/approach

The present study is applied research that has been conducted using text mining. The statistical population is the coronavirus publications that have been collected from the Web of Science Core Collection (1970–2020). The main keywords were extracted from the Medical Subject Heading browser to design the search strategy. Latent Dirichlet allocation and Python programming language were applied to analyze the data and implement the text mining algorithms of topic modeling.

Findings

The findings indicated that the SARS, science, protein, MERS, veterinary, cell, human, RNA, medicine and virology are the most important keywords in the global coronavirus publications. Also, eight important topics were identified in the global coronavirus publications by implementing the topic modeling algorithm. The highest number of publications were respectively on the following topics: “structure and proteomics,” “Cell signaling and immune response,” “clinical presentation and detection,” “Gene sequence and genomics,” “Diagnosis tests,” “vaccine and immune response and outbreak,” “Epidemiology and Transmission” and “gastrointestinal tissue.”

Originality/value

The originality of this article can be considered in three ways. First, text mining and Latent Dirichlet allocation were applied to analyzing coronavirus literature for the first time. Second, coronavirus is mentioned as a hot topic of research. Finally, in addition to the retrospective approaches to 50 years of data collection and analysis, the results can be exploited with prospective approaches to strategic planning and macro-policymaking.

中文翻译：

过去半个世纪冠状病毒出版物的回顾性和前瞻性方法：潜在狄利克雷分配分析

目的

本文的主要目的是对过去 50 年全球冠状病毒出版物的主题建模。

设计/方法/方法

本研究是使用文本挖掘进行的应用研究。统计人群是从 Web of Science 核心合集 (1970–2020) 中收集的冠状病毒出版物。从医学主题词浏览器中提取主要关键词来设计搜索策略。应用潜在狄利克雷分配和Python编程语言分析数据并实现主题建模的文本挖掘算法。

发现

调查结果表明，SARS、科学、蛋白质、MERS、兽医、细胞、人类、RNA、医学和病毒学是全球冠状病毒出版物中最重要的关键词。此外，通过实施主题建模算法，在全球冠状病毒出版物中确定了八个重要主题。发表数量最多的分别是：“结构与蛋白质组学”、“细胞信号与免疫反应”、“临床表现与检测”、“基因序列与基因组学”、“诊断试验”、“疫苗与免疫反应”和爆发”、“流行病学和传播”和“胃肠道组织”。