当前位置: X-MOL 学术Empir. Software Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparing the results of replications in software engineering
Empirical Software Engineering ( IF 3.5 ) Pub Date : 2021-02-02 , DOI: 10.1007/s10664-020-09907-7
Adrian Santos , Sira Vegas , Markku Oivo , Natalia Juristo

Context

It has been argued that software engineering replications are useful for verifying the results of previous experiments. However, it has not yet been agreed how to check whether the results hold across replications. Besides, some authors suggest that replications that do not verify the results of previous experiments can be used to identify contextual variables causing the discrepancies.

Objective

Study how to assess the (dis)similarity of the results of SE replications when they are compared to verify the results of previous experiments and understand how to identify whether contextual variables are influencing results.

Method

We run simulations to learn how different ways of comparing replication results behave when verifying the results of previous experiments. We illustrate how to deal with context-induced changes. To do this, we analyze three groups of replications from our own research on test-driven development and testing techniques.

Results

The direct comparison of p-values and effect sizes does not appear to be suitable for verifying the results of previous experiments and examining the variables possibly affecting the results in software engineering. Analytical methods such as meta-analysis should be used to assess the similarity of software engineering replication results and identify discrepancies in results.

Conclusion

The results achieved in baseline experiments should no longer be regarded as a result that needs to be reproduced, but as a small piece of evidence within a larger picture that only emerges after assembling many small pieces to complete the puzzle.



中文翻译:

比较软件工程中的复制结果

语境

有人认为,软件工程复制可用于验证以前的实验结果。但是,尚未商定如何检查结果是否适用于复制。此外,一些作者建议,不验证先前实验结果的复制可用于识别导致差异的上下文变量。

目的

研究当比较SE复制结果以验证先前实验的结果时如何评估SE复制结果的(不相似)相似性,并了解如何确定上下文变量是否在影响结果。

方法

我们运行模拟以了解在验证先前实验的结果时比较复制结果的不同方式的行为。我们说明了如何处理上下文导致的更改。为此,我们从对测试驱动的开发和测试技术的研究中分析了三组重复。

结果

p值和效果大小的直接比较似乎不适合验证先前实验的结果以及检查可能影响软件工程结果的变量。应该使用诸如荟萃分析之类的分析方法来评估软件工程复制结​​果的相似性,并确定结果之间的差异。

结论

在基线实验中获得的结果不应再被视为需要重现的结果,而应视为在较大图片中的一小部分证据,只有在组装许多小部件以完成拼图后才出现。

更新日期:2021-02-02
down
wechat
bug