当前位置: X-MOL 学术J. Syst. Softw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Can we benchmark Code Review studies? A systematic mapping study of methodology, dataset, and metric
Journal of Systems and Software ( IF 3.5 ) Pub Date : 2021-05-25 , DOI: 10.1016/j.jss.2021.111009
Dong Wang , Yuki Ueda , Raula Gaikovina Kula , Takashi Ishio , Kenichi Matsumoto

Context:

Code Review (CR) is the cornerstone for software quality assurance and a crucial practice for software development. As CR research matures, it can be difficult to keep track of the best practices and state-of-the-art in methodology, dataset, and metric.

Objective:

This paper investigates the potential of benchmarking by collecting methodology, dataset, and metric of CR studies.

Methods:

A systematic mapping study was conducted. A total of 112 studies from 19,847 papers published in high-impact venues between the years 2011 and 2019 were selected and analyzed.

Results:

First, we find that empirical evaluation is the most common methodology (65% of papers), with solution and experience being the least common methodology. Second, we highlight 50% of papers that use the quantitative method or mixed-method have the potential for replicability. Third, we identify 457 metrics that are grouped into sixteen core metric sets, applied to nine Software Engineering topics, showing different research topics tend to use specific metric sets.

Conclusion:

We conclude that at this stage, we cannot benchmark CR studies. Nevertheless, a common benchmark will facilitate new researchers, including experts from other fields, to innovate new techniques and build on top of already established methodologies. A full replication is available at https://naist-se.github.io/code-review/.



中文翻译:

我们可以对 Code Review 研究进行基准测试吗?方法论、数据集和度量的系统映射研究

语境:

代码审查 (CR) 是软件质量保证的基石,也是软件开发的重要实践。随着 CR 研究的成熟,可能很难跟踪方法、数据集和指标方面的最佳实践和最新技术。

客观的:

本文通过收集 CR 研究的方法、数据集和指标来研究基准测试的潜力。

方法:

进行了系统的绘图研究。从 2011 年至 2019 年间在高影响力场所发表的 19,847 篇论文中,共选择和分析了 112 项研究。

结果:

首先,我们发现经验评估是最常用的方法(占论文的 65%),解决方案和经验是最不常用的方法。其次,我们强调 50% 使用定量方法或混合方法的论文具有可复制性。第三,我们确定了 457 个指标,这些指标分为 16 个核心指标集,应用于 9 个软件工程主题,显示不同的研究主题倾向于使用特定的指标集。

结论:

我们得出的结论是,在此阶段,我们无法对 CR 研究进行基准测试。尽管如此,一个共同的基准将有助于新的研究人员,包括来自其他领域的专家,创新新技术并建立在已经建立的方法论之上。完整复制可在 https://naist-se.github.io/code-review/ 获得。

更新日期:2021-06-08
down
wechat
bug