Assigning credit to scientific datasets using article citation networks,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Assigning credit to scientific datasets using article citation networks
arXiv - CS - Information Retrieval Pub Date : 2020-01-16 , DOI: arxiv-2001.05917
Tong Zeng, Longfeng Wu, Sarah Bratt, Daniel E. Acuna

A citation is a well-established mechanism for connecting scientific artifacts. Citation networks are used by citation analysis for a variety of reasons, prominently to give credit to scientists' work. However, because of current citation practices, scientists tend to cite only publications, leaving out other types of artifacts such as datasets. Datasets then do not get appropriate credit even though they are increasingly reused and experimented with. We develop a network flow measure, called DataRank, aimed at solving this gap. DataRank assigns a relative value to each node in the network based on how citations flow through the graph, differentiating publication and dataset flow rates. We evaluate the quality of DataRank by estimating its accuracy at predicting the usage of real datasets: web visits to GenBank and downloads of Figshare datasets. We show that DataRank is better at predicting this usage compared to alternatives while offering additional interpretable outcomes. We discuss improvements to citation behavior and algorithms to properly track and assign credit to datasets.

中文翻译：

使用文章引用网络为科学数据集分配信用

引文是一种用于连接科学工件的完善机制。引文分析出于多种原因使用引文网络，主要是为了赞扬科学家的工作。然而，由于当前的引用实践，科学家倾向于只引用出版物，而忽略了其他类型的工件，例如数据集。即使数据集被越来越多地重复使用和试验，它们也不会得到适当的信任。我们开发了一种名为 DataRank 的网络流量度量，旨在解决这一差距。DataRank 根据引文在图表中的流动方式、区分出版物和数据集流率，为网络中的每个节点分配一个相对值。我们通过估计 DataRank 在预测真实数据集使用情况方面的准确性来评估 DataRank 的质量：对 GenBank 的网络访问和 Figshare 数据集的下载。我们表明，与替代方案相比，DataRank 更擅长预测这种使用，同时提供额外的可解释结果。我们讨论了对引用行为和算法的改进，以正确跟踪和分配数据集的信用。

更新日期：2020-01-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文