Navigating the landscape of COVID-19 research through literature analysis: A bird's eye view,arXiv - CS - Digital Libraries

当前位置： X-MOL 学术 › arXiv.cs.DL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Navigating the landscape of COVID-19 research through literature analysis: A bird's eye view
arXiv - CS - Digital Libraries Pub Date : 2020-08-07 , DOI: arxiv-2008.03397
Lana Yeganova, Rezarta Islamaj, Qingyu Chen, Robert Leaman, Alexis Allot, Chin-Hsuan Wei, Donald C. Comeau, Won Kim, Yifan Peng, W. John Wilbur, Zhiyong Lu

Timely access to accurate scientific literature in the battle with the ongoing COVID-19 pandemic is critical. This unprecedented public health risk has motivated research towards understanding the disease in general, identifying drugs to treat the disease, developing potential vaccines, etc. This has given rise to a rapidly growing body of literature that doubles in number of publications every 20 days as of May 2020. Providing medical professionals with means to quickly analyze the literature and discover growing areas of knowledge is necessary for addressing their question and information needs. In this study we analyze the LitCovid collection, 13,369 COVID-19 related articles found in PubMed as of May 15th, 2020 with the purpose of examining the landscape of literature and presenting it in a format that facilitates information navigation and understanding. We do that by applying state-of-the-art named entity recognition, classification, clustering and other NLP techniques. By applying NER tools, we capture relevant bioentities (such as diseases, internal body organs, etc.) and assess the strength of their relationship with COVID-19 by the extent they are discussed in the corpus. We also collect a variety of symptoms and co-morbidities discussed in reference to COVID-19. Our clustering algorithm identifies topics represented by groups of related terms, and computes clusters corresponding to documents associated with the topic terms. Among the topics we observe several that persist through the duration of multiple weeks and have numerous associated documents, as well several that appear as emerging topics with fewer documents. All the tools and data are publicly available, and this framework can be applied to any literature collection. Taken together, these analyses produce a comprehensive, synthesized view of COVID-19 research to facilitate knowledge discovery from literature.

中文翻译：

通过文献分析浏览 COVID-19 研究的景观：鸟瞰图

在与持续的 COVID-19 大流行的斗争中及时获取准确的科学文献至关重要。这种史无前例的公共卫生风险促使研究人员开始全面了解该疾病、确定治疗该疾病的药物、开发潜在疫苗等。这导致了快速增长的文献体系，截至 2017 年，每 20 天发表的文献数量就会翻一番。 2020 年 5 月。为医疗专业人员提供快速分析文献和发现不断增长的知识领域的方法对于解决他们的问题和信息需求是必要的。在这项研究中，我们分析了截至 5 月 15 日在 PubMed 中发现的 LitCovid 集合、13,369 篇 COVID-19 相关文章，2020 的目的是检查文学景观并以促进信息导航和理解的格式呈现。我们通过应用最先进的命名实体识别、分类、聚类和其他 NLP 技术来做到这一点。通过应用 NER 工具，我们可以捕获相关的生物实体（例如疾病、内脏器官等），并根据它们在语料库中的讨论程度来评估它们与 COVID-19 的关系强度。我们还收集了参考 COVID-19 讨论的各种症状和合并症。我们的聚类算法识别由相关术语组表示的主题，并计算与主题术语相关联的文档对应的集群。在我们观察到的主题中，有几个主题持续了数周并具有大量相关文档，还有一些主题显示为具有较少文档的新兴主题。所有工具和数据都是公开可用的，并且该框架可以应用于任何文献收藏。总之，这些分析产生了对 COVID-19 研究的全面、综合的观点，以促进从文献中发现知识。

更新日期：2020-09-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>