当前位置: X-MOL 学术J. Syst. Eng. Electron. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RFC: A feature selection algorithm for software defect prediction
Journal of Systems Engineering and Electronics ( IF 1.9 ) Pub Date : 2021-05-12 , DOI: 10.23919/jsee.2021.000032
Xu Xiaolong , Chen Wen , Wang Xinheng

Software defect prediction (SDP) is used to perform the statistical analysis of historical defect data to find out the distribution rule of historical defects, so as to effectively predictdefects in the new software. However, there are redundant and irrelevant features in the software defect datasets affecting the performance of defect predictors. In order to identify and remove the redundant and irrelevant features in software defectdatasets, we propose Relief F-based clustering (RFC), a cluster-based feature selection algorithm. Then, the correlation between features is calculated based on the symmetric uncertainty. According to the correlation degree, RFC partitions features into kclusters based on the k-medoids algorithm, and finally selects the representative features from each cluster to form the final feature subset. In the experiments, we compare the proposed RFC with classical feature selection algorithms on nine National Aeronautics and Space Administration (NASA) software defectprediction datasets in terms of area under curve (AUC) and F-value. The experimental results show that RFC can effectively improve the performance of SDP.

中文翻译:

RFC:一种用于软件缺陷预测的特征选择算法

软件缺陷预测(SDP)用于对历史缺陷数据进行统计分析,以找出历史缺陷的分布规律,从而有效地预测新软件中的缺陷。但是,软件缺陷数据集中有冗余和不相关的功能会影响缺陷预测器的性能。为了识别和删除软件缺陷数据集中的冗余和不相关的特征,我们提出了基于救济F的聚类(RFC),这是一种基于聚类的特征选择算法。然后,基于对称不确定性计算特征之间的相关性。根据相关度,RFC根据k-medoids算法将特征划分为kclusters,最后从每个聚类中选择代表特征,形成最终的特征子集。在实验中 我们将建议的RFC与9个美国国家航空航天局(NASA)软件缺陷预测数据集上的经典特征选择算法进行比较,以得出曲线下面积(AUC)和F值。实验结果表明,RFC可以有效提高SDP的性能。
更新日期:2021-05-14
down
wechat
bug