Improving deep‐learning‐based fault localization with resampling,Journal of Software: Evolution and Process

当前位置： X-MOL 学术 › J. Softw. Evol. Process › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improving deep‐learning‐based fault localization with resampling
Journal of Software: Evolution and Process ( IF 1.7 ) Pub Date : 2020-08-26 , DOI: 10.1002/smr.2312
Zhuo Zhang _{1,

2} , Yan Lei _{3,

4} , Xiaoguang Mao ₂ , Meng Yan _{3,

4} , Ling Xu _{3,

4} , Junhao Wen _{3,

4}

Affiliation

Funding information Guangxi Key Laboratory of Trusted Software, Grant/Award Number: kx202008; Fundamental Research Funds for the Central Universities, Grant/Award Number: 2019CDXYRJ0011; National Natural Science Foundation of China, Grant/Award Numbers: 61602504, 61379054, and 61672529; Scientific Research Fund of Hunan Provincial Education Department, Grant/Award Number: 15A007 Abstract Many fault localization approaches recently utilize deep learning to learn an effective localization model showing a fresh perspective with promising results. However, localization models are generally learned from class imbalance datasets; that is, the number of failing test cases is much fewer than passing test cases. It may be highly susceptible to affect the accuracy of learned localization models. Thus, in this paper, we explore using data resampling to reduce the negative effect of the imbalanced class problem and improve the accuracy of learned models of deep-learning-based fault localization. Specifically, for deep-learning-based fault localization, its learning feature may require duplicate essential data to enhance the weak but beneficial experience incurred by the class imbalance datasets. We leverage the property of test cases (i.e., passing or failing) to identify failing test cases as the duplicate essential data and propose an iterative oversampling approach to resample failing test cases for producing a class balanced test suite. We apply the test case resampling to representative localization models using deep learning. Our empirical results on eight large-sized programs with real faults and four large-sized programs with seeded faults show that the test case resampling significantly improves fault localization effectiveness.

中文翻译：

通过重采样改进基于深度学习的故障定位

基金信息广西可信软件重点实验室，资助/奖励编号：kx202008；中央高校基本科研业务费，资助/奖励号：2019CDXYRJ0011；国家自然科学基金，资助/奖励号：61602504、61379054、61672529；湖南省教育厅科研基金，资助/奖励编号：15A007 摘要近年来，许多故障定位方法利用深度学习来学习一种有效的定位模型，显示出新的视角和可喜的成果。然而，定位模型通常是从类不平衡数据集中学习的；也就是说，失败的测试用例的数量比通过的测试用例少得多。它可能很容易影响学习到的定位模型的准确性。因此，在本文中，我们探索使用数据重采样来减少不平衡类问题的负面影响，并提高基于深度学习的故障定位学习模型的准确性。具体来说，对于基于深度学习的故障定位，其学习特征可能需要重复的基本数据，以增强类不平衡数据集带来的弱但有益的体验。我们利用测试用例的属性（即，通过或失败）将失败的测试用例识别为重复的基本数据，并提出一种迭代过采样方法来重新采样失败的测试用例以生成类平衡测试套件。我们使用深度学习将测试用例重采样应用于具有代表性的定位模型。

更新日期：2020-08-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文