当前位置: X-MOL 学术arXiv.cs.SE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Determining the Intrinsic Structure of Public Software Development History
arXiv - CS - Software Engineering Pub Date : 2020-11-16 , DOI: arxiv-2011.07914
Antoine Pietri (DGD-I), Guillaume Rousseau (UP, DGD-I), Stefano Zacchiroli (UP, DGD-I)

Background. Collaborative software development has produced a wealth of version control system (VCS) data that can now be analyzed in full. Little is known about the intrinsic structure of the entire corpus of publicly available VCS as an interconnected graph. Understanding its structure is needed to determine the best approach to analyze it in full and to avoid methodological pitfalls when doing so. Objective. We intend to determine the most salient network topol-ogy properties of public software development history as captured by VCS. We will explore: degree distributions, determining whether they are scale-free or not; distribution of connect component sizes; distribution of shortest path lengths.Method. We will use Software Heritage-which is the largest corpus of public VCS data-compress it using webgraph compression techniques, and analyze it in-memory using classic graph algorithms. Analyses will be performed both on the full graph and on relevant subgraphs. Limitations. The study is exploratory in nature; as such no hypotheses on the findings is stated at this time. Chosen graph algorithms are expected to scale to the corpus size, but it will need to be confirmed experimentally. External validity will depend on how representative Software Heritage is of the software commons.

中文翻译:

确定公共软件开发历史的内在结构

背景。协作软件开发产生了丰富的版本控制系统 (VCS) 数据,现在可以对其进行全面分析。关于作为互连图的公开可用的 VCS 整个语料库的内在结构知之甚少。需要了解其结构才能确定对其进行全面分析的最佳方法,并避免这样做时的方法论缺陷。客观的。我们打算确定 VCS 捕获的公共软件开发历史中最显着的网络拓扑属性。我们将探索: 度分布,确定它们是否是无标度的;连接组件尺寸分布;最短路径长度的分布。方法。我们将使用 Software Heritage——它是最大的公共 VCS 数据语料库——使用 webgraph 压缩技术对其进行压缩,并使用经典图形算法在内存中对其进行分析。将对全图和相关子图进行分析。限制。该研究本质上是探索性的;因此,目前没有对调查结果提出任何假设。选择的图算法预计会扩展到语料库的大小,但这需要通过实验来确认。外部有效性将取决于软件遗产在软件公地中的代表性。
更新日期:2020-11-17
down
wechat
bug