当前位置: X-MOL 学术Inf. Softw. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A comprehensive investigation of the impact of feature selection techniques on crashing fault residence prediction models
Information and Software Technology ( IF 3.8 ) Pub Date : 2021-06-01 , DOI: 10.1016/j.infsof.2021.106652
Kunsong Zhao , Zhou Xu , Meng Yan , Tao Zhang , Dan Yang , Wei Li

Context:

Software crash is a serious form of the software failure, which often occurs during the software development and maintenance process. As the stack trace reported when the software crashes contains a wealth of information about crashes, recent work utilized classification models with the collected features from stack traces and source code to predict whether the fault causing the crash resides in the stack trace. This could speed-up the crash localization task.

Objective:

As the quality of features can affect the performance of the constructed classification models, researchers proposed to use feature selection methods to select a representative feature subset to build models by replacing the original features. However, only limited feature selection methods and classification models were taken into consideration for this issue in previous work. In this work, we look into this topic deeply and find out the best feature selection method for crash fault residence prediction task.

Method:

We study the performance of 24 feature selection techniques with 21 classification models on a benchmark dataset containing crash instances from 7 real-world software projects. We use 4 indicators to evaluate the performance of these feature selection methods which are applied to the classification models.

Results:

The experimental results show that, overall, a probability-based feature selection, called Symmetrical Uncertainty, performs well across the studied classification models and projects. Thus, we recommend such a feature selection method to preprocess the crash instances before constructing classification models to predict the crash fault residence.

Conclusion:

This work conducts a large-scale empirical study to investigate the impact of feature selection methods on the performance of classification models for the crashing fault residence prediction task. The results clearly demonstrate that there exist significant performance differences among these feature selection techniques across different classification models and projects.



中文翻译:

特征选择技术对碰撞故障驻留预测模型影响的综合调查

语境:

软件崩溃是软件故障的一种严重形式,它经常发生在软件开发和维护过程中。由于软件崩溃时报告的堆栈跟踪包含大量有关崩溃的信息,最近的工作利用分类模型和从堆栈跟踪和源代码中收集的特征来预测导致崩溃的故障是否存在于堆栈跟踪中。这可以加速崩溃定位任务。

客观的:

由于特征的质量会影响构建的分类模型的性能,研究人员提出使用特征选择方法,通过替换原始特征来选择具有代表性的特征子集来构建模型。然而,在以前的工作中,针对这个问题只考虑了有限的特征选择方法和分类模型。在这项工作中,我们深入研究了这个话题,并找出了碰撞故障驻留预测任务的最佳特征选择方法。

方法:

我们在包含来自 7 个真实世界软件项目的崩溃实例的基准数据集上研究了 24 个特征选择技术和 21 个分类模型的性能。我们使用 4 个指标来评估这些应用于分类模型的特征选择方法的性能。

结果:

实验结果表明,总体而言,称为对称不确定性的基于概率的特征选择在所研究的分类模型和项目中表现良好。因此,我们推荐这种特征选择方法在构建分类模型以预测碰撞故障驻留之前对碰撞实例进行预处理。

结论:

这项工作进行了大规模的实证研究,以研究特征选择方法对碰撞故障驻留预测任务分类模型性能的影响。结果清楚地表明,跨不同分类模型和项目的这些特征选择技术之间存在显着的性能差异。

更新日期:2021-06-05
down
wechat
bug