What makes a popular academic AI repository?,Empirical Software Engineering

当前位置： X-MOL 学术 › Empir. Software Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

What makes a popular academic AI repository?
Empirical Software Engineering ( IF 4.1 ) Pub Date : 2021-01-01 , DOI: 10.1007/s10664-020-09916-6
Yuanrui Fan , Xin Xia , David Lo , Ahmed E. Hassan , Shanping Li

Many AI researchers are publishing code, data and other resources that accompany their papers in GitHub repositories. In this paper, we refer to these repositories as academic AI repositories. Our preliminary study shows that highly cited papers are more likely to have popular academic AI repositories (and vice versa). Hence, in this study, we perform an empirical study on academic AI repositories to highlight good software engineering practices of popular academic AI repositories for AI researchers. We collect 1,149 academic AI repositories, in which we label the top 20% repositories that have the most number of stars as popular, and we label the bottom 70% repositories as unpopular. The remaining 10% repositories are set as a gap between popular and unpopular academic AI repositories. We propose 21 features to characterize the software engineering practices of academic AI repositories. Our experimental results show that popular and unpopular academic AI repositories are statistically significantly different in 11 of the studied features—indicating that the two groups of repositories have significantly different software engineering practices. Furthermore, we find that the number of links to other GitHub repositories in the README file, the number of images in the README file and the inclusion of a license are the most important features for differentiating the two groups of academic AI repositories. Our dataset and code are made publicly available to share with the community.

中文翻译：

是什么造就了流行的学术 AI 存储库？

许多 AI 研究人员正在 GitHub 存储库中发布与他们的论文相关的代码、数据和其他资源。在本文中，我们将这些存储库称为学术 AI 存储库。我们的初步研究表明，高被引论文更有可能拥有流行的学术 AI 知识库（反之亦然）。因此，在本研究中，我们对学术 AI 存储库进行了实证研究，以突出 AI 研究人员流行的学术 AI 存储库的良好软件工程实践。我们收集了 1,149 个学术 AI 存储库，其中我们将拥有最多星数的前 20% 的存储库标记为流行，并将底部 70% 的存储库标记为不受欢迎。其余 10% 的存储库被设置为流行和不受欢迎的学术 AI 存储库之间的差距。我们提出了 21 个特征来表征学术 AI 存储库的软件工程实践。我们的实验结果表明，流行和不流行的学术 AI 存储库在 11 个研究特征上具有统计学显着差异——表明这两组存储库具有显着不同的软件工程实践。此外，我们发现 README 文件中指向其他 GitHub 存储库的链接数量、README 文件中的图像数量以及包含的许可证是区分两组学术 AI 存储库的最重要特征。我们的数据集和代码已公开，可与社区共享。我们的实验结果表明，流行和不流行的学术 AI 存储库在 11 个研究特征上具有统计学显着差异——表明这两组存储库具有显着不同的软件工程实践。此外，我们发现 README 文件中指向其他 GitHub 存储库的链接数量、README 文件中的图像数量以及包含的许可证是区分两组学术 AI 存储库的最重要特征。我们的数据集和代码已公开，可与社区共享。我们的实验结果表明，流行和不流行的学术 AI 存储库在 11 个研究特征上具有统计学显着差异——表明这两组存储库具有显着不同的软件工程实践。此外，我们发现 README 文件中指向其他 GitHub 存储库的链接数量、README 文件中的图像数量以及包含的许可证是区分两组学术 AI 存储库的最重要特征。我们的数据集和代码已公开，可与社区共享。README 文件中的图像数量和包含的许可证是区分两组学术 AI 存储库的最重要特征。我们的数据集和代码已公开，可与社区共享。README 文件中的图像数量和包含的许可证是区分两组学术 AI 存储库的最重要特征。我们的数据集和代码已公开，可与社区共享。

更新日期：2021-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>