How detrimental is coincidental correctness to coverage-based fault detection and localization? An empirical study,Software Testing, Verification and Reliability

当前位置： X-MOL 学术 › Softw. Test. Verif. Reliab. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

How detrimental is coincidental correctness to coverage-based fault detection and localization? An empirical study
Software Testing, Verification and Reliability ( IF 1.5 ) Pub Date : 2021-01-09 , DOI: 10.1002/stvr.1762
Rawad Abou Assi ₁ , Wes Masri ₁ , Chadi Trad ₁

Affiliation

According to the reachability–infection–propagation (RIP) model, three conditions must be satisfied for program failure to occur: (1) the defect's location must be reached, (2) the program's state must become infected and (3) the infection must propagate to the output. Weak coincidental correctness (or weak CC) occurs when the program produces the correct output, while condition (1) is satisfied but conditions (2) and (3) are not satisfied. Strong coincidental correctness (or strong CC) occurs when the output is correct, while both conditions (1) and (2) are satisfied but not (3). The prevalence of CC was previously recognized. In addition, the potential for its negative effect on spectrum-based fault localization (SBFL) was analytically demonstrated; however, this was not empirically validated. Using Defects4J, this paper empirically studies the impact of weak and strong CC on three well-researched coverage-based fault detection and localization techniques, namely, test suite reduction (TSR), test case prioritization (TCP) and SBFL. Our study, which involved 52 SBFL metrics, provides the following empirical evidence. (i) The negative impact of CC tests on TSR and TCP is very significant. In addition, cleansing the CC tests was observed to yield (a) a 100% TSR defect detection rate for all subject programs and (b) an improvement of TCP for over 92% of the subjects. (ii) The impact of CC tests on SBFL varies widely w.r.t. the metric used. The negative impact was strong for 11 metrics, mild for 37, non-measurable for 1 and non-existent for 3 metrics. Interestingly, the negative impact was mild for the 9 most popular and/or most effective SBFL metrics. In addition, cleansing the CC tests resulted in the deterioration of SBFL for a considerable number of subject programs. (iii) Increasing the proportion of CC tests has a limited impact on TSR, TCP and SBFL. Interestingly, for TSR and TCP and 11 SBFL metrics, small and large proportions of CC tests are strongly harmful. (iv) Lastly, weak and strong CC are equally detrimental in the context of TSR, TCP and SBFL.

中文翻译：

巧合正确性对基于覆盖的故障检测和定位有多大危害？一项实证研究

根据可达性感染传播（RIP）模型，三个条件必须满足用于编程故障发生：（1）缺陷的位置必须ř eached，（2）该程序的状态必须成为我nfected和（3）的感染必须p ropagate到输出。当程序产生正确的输出时，会出现弱巧合正确性（或弱 CC），同时满足条件（1）但不满足条件（2）和（3）。当输出正确时发生强巧合正确性（或强 CC），同时满足条件（1）和（2）但不满足（3）。CC的流行之前被认出来了。此外，分析证明了其对基于频谱的故障定位 (SBFL) 的潜在负面影响；然而，这并没有得到经验验证。本文使用Defects4J实证研究了弱CC和强 CC对三种经过充分研究的基于覆盖的故障检测和定位技术的影响，即测试套件缩减 (TSR)、测试用例优先级 (TCP) 和 SBFL。我们的研究涉及 52 个 SBFL 指标，提供了以下经验证据。(i) CC测试对 TSR 和 TCP的负面影响非常显着。此外，清洁CC观察到测试产生 (a) 所有主题程序的 100% TSR 缺陷检测率和 (b) 超过 92% 的主题的 TCP 改进。(ii) CC测试对 SBFL的影响因使用的指标而异。11 个指标的负面影响很强，37 个指标的负面影响轻微，1 个指标无法衡量，3 个指标不存在。有趣的是，对于 9 个最受欢迎和/或最有效的 SBFL 指标，负面影响是轻微的。此外，清理CC测试导致 SBFL 恶化，适用于相当多的主题程序。(iii) 增加CC测试的比例对 TSR、TCP 和 SBFL 的影响有限。有趣的是，对于 TSR 和 TCP 以及 11 个 SBFL 指标，CC 的大小比例测试是非常有害的。(iv) 最后，弱和强 CC在 TSR、TCP 和 SBFL 的背景下同样有害。

更新日期：2021-01-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>