当前位置: X-MOL 学术Empir. Software Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Software provenance tracking at the scale of public source code
Empirical Software Engineering ( IF 3.5 ) Pub Date : 2020-05-29 , DOI: 10.1007/s10664-020-09828-5
Guillaume Rousseau , Roberto Di Cosmo , Stefano Zacchiroli

We study the possibilities to track provenance of software source code artifacts within the largest publicly accessible corpus of publicly available source code, the Software Heritage archive, with over 4 billions unique source code files and 1 billion commits capturing their development histories across 50 million software projects. We perform a systematic and generic estimate of the replication factor across the different layers of this corpus, analysing how much the same artifacts (e.g., SLOC, files or commits) appear in different contexts (e.g., files, commits or source code repositories). We observe a combinatorial explosion in the number of identical source code files across different commits. To discuss the implication of these findings, we benchmark different data models for capturing software provenance information at this scale, and we identify a viable solution, based on the properties of isochrone subgraphs, that is deployable on commodity hardware, is incremental and appears to be maintainable for the foreseeable future. Using these properties, we quantify, at a scale never achieved previously, the growth rate of original, i.e. never-seen-before, source code files and commits, and find it to be exponential over a period of more than 40 years.

中文翻译:

公共源代码规模的软件来源跟踪

我们研究了在最大的可公开访问的公开源代码语料库中跟踪软件源代码工件来源的可能性,软件遗产档案,拥有超过 40 亿个独特的源代码文件和 10 亿次提交,在 5000 万个软件项目中捕获它们的开发历史. 我们对该语料库不同层的复制因子进行系统和通用的估计,分析在不同上下文(例如,文件、提交或源代码存储库)中出现多少相同的工件(例如,SLOC、文件或提交)。我们观察到不同提交中相同源代码文件数量的组合爆炸。为了讨论这些发现的含义,我们对不同的数据模型进行了基准测试,以捕获这种规模的软件来源信息,我们根据等时线子图的属性确定了一个可行的解决方案,它可以部署在商品硬件上,是增量的,并且在可预见的未来似乎是可维护的。使用这些属性,我们以前所未有的规模量化原始(即前所未见)源代码文件和提交的增长率,并发现它在 40 多年的时间里呈指数增长。
更新日期:2020-05-29
down
wechat
bug