当前位置: X-MOL 学术Software Qual. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An automated approach to assess the similarity of GitHub repositories
Software Quality Journal ( IF 1.9 ) Pub Date : 2020-02-15 , DOI: 10.1007/s11219-019-09483-0
Phuong T. Nguyen , Juri Di Rocco , Riccardo Rubei , Davide Di Ruscio

Open source software (OSS) allows developers to study, change, and improve the code free of charge. There are several high-quality software projects which deliver stable and well-documented products. Most OSS forges typically sustain active user and expert communities which in turn provide decent levels of support both with respect to answering user questions as well as to repairing reported software bugs. Code reuse is an intrinsic feature of OSS, and developing a new system by leveraging existing open source components can reduce development effort, and thus it can be beneficial to at least two phases of the software life cycle, i.e., implementation and maintenance. However, to improve software quality, it is essential to develop a system by learning from well-defined, mature projects. In this sense, the ability to find similar projects that facilitate the undergoing development activities is of high importance. In this paper, we address the issue of mining open source software repositories to detect similar projects, which can be eventually reused by developers. We propose CrossSim as a novel approach to model the OSS ecosystem and to compute similarities among software projects. An evaluation on a dataset collected from GitHub shows that our proposed approach outperforms three well-established baselines.

中文翻译:

一种评估 GitHub 存储库相似性的自动化方法

开源软件 (OSS) 允许开发人员免费研究、更改和改进代码。有几个高质量的软件项目可以提供稳定且文档齐全的产品。大多数 OSS 伪造通常维持活跃的用户和专家社区,这些社区反过来在回答用户问题以及修复报告的软件错误方面提供体面的支持。代码重用是OSS的固有特性,利用现有的开源组件开发新系统可以减少开发工作,因此至少有利于软件生命周期的两个阶段,即实现和维护。然而,为了提高软件质量,必须通过从定义明确的成熟项目中学习来开发系统。在这个意义上,能够找到促进正在进行的发展活动的类似项目的能力非常重要。在本文中,我们解决了挖掘开源软件存储库以检测类似项目的问题,这些项目最终可以被开发人员重用。我们建议将 CrossSim 作为一种新颖的方法来模拟 OSS 生态系统并计算软件项目之间的相似性。对从 GitHub 收集的数据集的评估表明,我们提出的方法优于三个完善的基线。
更新日期:2020-02-15
down
wechat
bug