当前位置: X-MOL 学术arXiv.cs.DL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GitHub Repositories with Links to Academic Papers: Open Access, Traceability, and Evolution
arXiv - CS - Digital Libraries Pub Date : 2020-04-01 , DOI: arxiv-2004.00199
Supatsara Wattanakriengkrai, Bodin Chinthanet, Hideaki Hata, Raula Gaikovina Kula, Christoph Treude, Jin Guo, Kenichi Matsumoto

Traceability between published scientific breakthroughs and their implementation is essential, especially in the case of Open Source Software implements bleeding edge science into its code. However, aligning the link between GitHub repositories and academic papers can prove difficult, and the link impact remains unknown. This paper investigates the role of academic paper references contained in these repositories. We conducted a large-scale study of 20 thousand GitHub repositories to establish prevalence of references to academic papers. We use a mixed-methods approach to identify Open Access (OA), traceability and evolutionary aspects of the links. Although referencing a paper is not typical, we find that a vast majority of referenced academic papers are OA. In terms of traceability, our analysis revealed that machine learning is the most prevalent topic of repositories. These repositories tend to be affiliated with academic communities. More than half of the papers do not link back to any repository. A case study of referenced arXiv paper shows that most of these papers are high-impact and influential and do align with academia, referenced by repositories written in different programming languages. From the evolutionary aspect, we find very few changes of papers being referenced and links to them.

中文翻译:

带有学术论文链接的 GitHub 存储库:开放获取、可追溯性和进化

已发表的科学突破与其实施之间的可追溯性至关重要,尤其是在开源软件将前沿科学实施到其代码中的情况下。然而,对齐 GitHub 存储库和学术论文之间的链接可能很困难,而且链接的影响仍然未知。本文调查了这些存储库中包含的学术论文参考文献的作用。我们对 2 万个 GitHub 存储库进行了大规模研究,以确定学术论文引用的普遍性。我们使用混合方法来识别链接的开放访问 (OA)、可追溯性和演化方面。虽然引用论文并不典型,但我们发现绝大多数引用的学术论文都是 OA。在可追溯性方面,我们的分析表明,机器学习是存储库中最流行的主题。这些存储库往往隶属于学术社区。超过一半的论文没有链接回任何存储库。引用 arXiv 论文的案例研究表明,这些论文中的大多数都具有很高的影响力和影响力,并且确实与学术界保持一致,被以不同编程语言编写的存储库引用。从进化的角度来看,我们发现被引用的论文和与它们的链接的变化很少。由用不同编程语言编写的存储库引用。从进化的角度来看,我们发现被引用的论文和与它们的链接的变化很少。由用不同编程语言编写的存储库引用。从进化的角度来看,我们发现被引用的论文和与它们的链接的变化很少。
更新日期:2020-04-03
down
wechat
bug