当前位置: X-MOL 学术Empir. Software Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
How different are different diff algorithms in Git?
Empirical Software Engineering ( IF 3.5 ) Pub Date : 2019-09-11 , DOI: 10.1007/s10664-019-09772-z
Yusuf Sulistyo Nugroho , Hideaki Hata , Kenichi Matsumoto

Automatic identification of the differences between two versions of a file is a common and basic task in several applications of mining code repositories. Git, a version control system, has a diff utility and users can select algorithms of diff from the default algorithm Myers to the advanced Histogram algorithm. From our systematic mapping, we identified three popular applications of diff in recent studies. On the impact on code churn metrics in 14 Java projects, we obtained different values in 1.7% to 8.2% commits based on the different diff algorithms. Regarding bug-introducing change identification, we found 6.0% and 13.3% in the identified bug-fix commits had different results of bug-introducing changes from 10 Java projects. For patch application, we found that the Histogram is more suitable than Myers for providing the changes of code, from our manual analysis. Thus, we strongly recommend using the Histogram algorithm when mining Git repositories to consider differences in source code.

中文翻译:

Git 中不同的差异算法有何不同?

自动识别文件的两个版本之间的差异是挖掘代码存储库的几个应用程序中的常见和基本任务。Git,一个版本控制系统,有一个 diff 实用程序,用户可以选择 diff 的算法,从默认算法 Myers 到高级直方图算法。从我们的系统映射中,我们确定了最近研究中三种流行的 diff 应用。关于对 14 个 Java 项目中代码流失指标的影响,我们根据不同的差异算法在 1.7% 到 8.2% 的提交中获得了不同的值。关于引入错误的更改识别,我们发现在已识别的错误修复提交中,6.0% 和 13.3% 的 10 个 Java 项目的错误引入更改的结果不同。对于补丁应用,我们发现 Histogram 比 Myers 更适合提供代码的变化,从我们的手动分析。因此,我们强烈建议在挖掘 Git 存储库时使用直方图算法来考虑源代码的差异。
更新日期:2019-09-11
down
wechat
bug