当前位置: X-MOL 学术arXiv.cs.DL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ribonucleic acid (RNA) virus and coronavirus in Google Dataset Search: their scope and epidemiological correlation
arXiv - CS - Digital Libraries Pub Date : 2021-01-09 , DOI: arxiv-2101.03339
Manuel Blázquez-Ochando, Juan-José Prieto-Gutiérrez

This paper presents an analysis of the publication of datasets collected via Google Dataset Search, specialized in families of RNA viruses, whose terminology was obtained from the National Cancer Institute (NCI) thesaurus developed by the US Department of Health and Human Services. The objective is to determine the scope and reuse capacity of the available data, determine the number of datasets and their free access, the proportion in reusable download formats, the main providers, their publication chronology, and to verify their scientific provenance. On the other hand, we also define possible relationships between the publication of datasets and the main pandemics that have occurred during the last 10 years. The results obtained highlight that only 52% of the datasets are related to scientific research, while an even smaller fraction (15%) are reusable. There is also an upward trend in the publication of datasets, especially related to the impact of the main epidemics, as clearly confirmed for the Ebola virus, Zika, SARS-CoV, H1N1, H1N5, and especially the SARS-CoV-2 coronavirus. Finally, it is observed that the search engine has not yet implemented adequate methods for filtering and monitoring the datasets. These results reveal some of the difficulties facing open science in the dataset field.

中文翻译:

Google数据集搜索中的核糖核酸(RNA)病毒和冠状病毒:其范围和流行病学相关性

本文介绍了对通过专门针对RNA病毒家族的Google数据集搜索收集的数据集的发布情况的分析,该术语的术语来自美国卫生与公共服务部开发的国家癌症研究所(NCI)词库。目的是确定可用数据的范围和重用能力,确定数据集的数量及其免费访问,确定可重用下载格式的比例,主要提供者,其出版年代,并验证其科学渊源。另一方面,我们还定义了数据集的发布与最近10年中发生的主要流行病之间的可能关系。获得的结果表明,只有52%的数据集与科学研究有关,而更少的一部分(15%)是可重用的。数据集的发布也有上升的趋势,尤其是与主要流行病的影响有关,这已被埃博拉病毒,寨卡病毒,SARS-CoV,H1N1,H1N5尤其是SARS-CoV-2冠状病毒明确证实。最后,可以发现,搜索引擎尚未实现用于过滤和监视数据集的适当方法。这些结果揭示了开放科学在数据集领域面临的一些困难。可以看出,搜索引擎尚未实现用于过滤和监视数据集的适当方法。这些结果揭示了开放科学在数据集领域面临的一些困难。可以看出,搜索引擎尚未实现用于过滤和监视数据集的适当方法。这些结果揭示了开放科学在数据集领域面临的一些困难。
更新日期:2021-01-12
down
wechat
bug