A comprehensive study of automatic program repair on the QuixBugs benchmark,Journal of Systems and Software

当前位置： X-MOL 学术 › J. Syst. Softw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A comprehensive study of automatic program repair on the QuixBugs benchmark
Journal of Systems and Software ( IF 3.7 ) Pub Date : 2021-01-01 , DOI: 10.1016/j.jss.2020.110825
He Ye , Matias Martinez , Thomas Durieux , Martin Monperrus

Abstract Automatic program repair papers tend to repeatedly use the same benchmarks. This poses a threat to the external validity of the findings of the program repair research community. In this paper, we perform an empirical study of automatic repair on a benchmark of bugs called QuixBugs, which has been little studied. In this paper, (1) We report on the characteristics of QuixBugs; (2) We study the effectiveness of 10 program repair tools on it; (3) We apply three patch correctness assessment techniques to comprehensively study the presence of overfitting patches in QuixBugs. Our key results are: (1) 16/40 buggy programs in QuixBugs can be repaired with at least a test suite adequate patch; (2) A total of 338 plausible patches are generated on the QuixBugs by the considered tools, and 53.3% of them are overfitting patches according to our manual assessment; (3) The three automated patch correctness assessment techniques, R G T E v o s u i t e , R G T I n p u t S a m p l i n g and G T I n v a r i a n t s , achieve an accuracy of 98.2%, 80.8% and 58.3% in overfitting detection, respectively. To our knowledge, this is the largest empirical study of automatic repair on QuixBugs, combining both quantitative and qualitative insights. All our empirical results are publicly available on GitHub in order to facilitate future research on automatic program repair.

中文翻译：

QuixBugs 基准上自动程序修复的综合研究

摘要自动程序修复论文倾向于重复使用相同的基准。这对程序修复研究社区的发现的外部有效性构成了威胁。在本文中，我们对称为 QuixBugs 的漏洞基准进行了自动修复的实证研究，该漏洞很少被研究。在本文中，（1）我们报告了 QuixBugs 的特点；（2）我们研究了10个程序修复工具对其的有效性；(3) 我们应用三种补丁正确性评估技术来综合研究 QuixBugs 中过度拟合补丁的存在。我们的主要结果是：（1）QuixBugs 中 16/40 的错误程序可以通过至少一个测试套件足够的补丁来修复；(2) 所考虑的工具在 QuixBugs 上总共生成了 338 个似是而非的补丁，其中 53 个。根据我们的人工评估，其中 3% 是过拟合补丁；(3) RGTE vosuite、RGTI 输入采样和GTI nvariants 这三种自动补丁正确性评估技术在过拟合检测中分别达到了98.2%、80.8% 和58.3% 的准确率。据我们所知，这是对 QuixBugs 自动修复的最大实证研究，结合了定量和定性见解。我们所有的实证结果都在 GitHub 上公开提供，以促进未来对自动程序修复的研究。这是对 QuixBugs 自动修复的最大实证研究，结合了定量和定性见解。我们所有的实证结果都在 GitHub 上公开提供，以促进未来对自动程序修复的研究。这是对 QuixBugs 自动修复的最大实证研究，结合了定量和定性见解。我们所有的实证结果都在 GitHub 上公开提供，以促进未来对自动程序修复的研究。

更新日期：2021-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11