A comprehensive investigation of the impact of feature selection techniques on crashing fault residence prediction models,Information and Software Technology

当前位置： X-MOL 学术 › Inf. Softw. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A comprehensive investigation of the impact of feature selection techniques on crashing fault residence prediction models
Information and Software Technology ( IF 3.8 ) Pub Date : 2021-06-01 , DOI: 10.1016/j.infsof.2021.106652
Kunsong Zhao , Zhou Xu , Meng Yan , Tao Zhang , Dan Yang , Wei Li

Context:

Software crash is a serious form of the software failure, which often occurs during the software development and maintenance process. As the stack trace reported when the software crashes contains a wealth of information about crashes, recent work utilized classification models with the collected features from stack traces and source code to predict whether the fault causing the crash resides in the stack trace. This could speed-up the crash localization task.

Objective:

As the quality of features can affect the performance of the constructed classification models, researchers proposed to use feature selection methods to select a representative feature subset to build models by replacing the original features. However, only limited feature selection methods and classification models were taken into consideration for this issue in previous work. In this work, we look into this topic deeply and find out the best feature selection method for crash fault residence prediction task.

Method:

We study the performance of 24 feature selection techniques with 21 classification models on a benchmark dataset containing crash instances from 7 real-world software projects. We use 4 indicators to evaluate the performance of these feature selection methods which are applied to the classification models.

Results:

The experimental results show that, overall, a probability-based feature selection, called Symmetrical Uncertainty, performs well across the studied classification models and projects. Thus, we recommend such a feature selection method to preprocess the crash instances before constructing classification models to predict the crash fault residence.

Conclusion:

This work conducts a large-scale empirical study to investigate the impact of feature selection methods on the performance of classification models for the crashing fault residence prediction task. The results clearly demonstrate that there exist significant performance differences among these feature selection techniques across different classification models and projects.

中文翻译：