当前位置: X-MOL 学术Sci. Program. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Software Defect Prediction Based on Fuzzy Weighted Extreme Learning Machine with Relative Density Information
Scientific Programming Pub Date : 2020-11-18 , DOI: 10.1155/2020/8852705
Shang Zheng 1 , Jinjing Gai 1 , Hualong Yu 1 , Haitao Zou 1 , Shang Gao 1
Affiliation  

To identify software modules that are more likely to be defective, machine learning has been used to construct software defect prediction (SDP) models. However, several previous works have found that the imbalanced nature of software defective data can decrease the model performance. In this paper, we discussed the issue of how to improve imbalanced data distribution in the context of SDP, which can benefit software defect prediction with the aim of finding better methods. Firstly, a relative density was introduced to reflect the significance of each instance within its class, which is irrelevant to the scale of data distribution in feature space; hence, it can be more robust than the absolute distance information. Secondly, a K-nearest-neighbors-based probability density estimation (KNN-PDE) alike strategy was utilised to calculate the relative density of each training instance. Furthermore, the fuzzy memberships of sample were designed based on relative density in order to eliminate classification error coming from noise and outlier samples. Finally, two algorithms were proposed to train software defect prediction models based on the weighted extreme learning machine. This paper compared the proposed algorithms with traditional SDP methods on the benchmark data sets. It was proved that the proposed methods have much better overall performance in terms of the measures including G-mean, AUC, and Balance. The proposed algorithms are more robust and adaptive for SDP data distribution types and can more accurately estimate the significance of each instance and assign the identical total fuzzy coefficients for two different classes without considering the impact of data scale.

中文翻译:

基于相对密度信息的模糊加权极限学习机的软件缺陷预测

为了识别更有可能出现缺陷的软件模块,机器学习已被用于构建软件缺陷预测 (SDP) 模型。然而,之前的一些工作发现,软件缺陷数据的不平衡性质会降低模型性能。在本文中,我们讨论了如何在 SDP 背景下改善不平衡数据分布的问题,这有利于软件缺陷预测,目的是找到更好的方法。首先,引入相对密度来反映每个实例在其类中的重要性,与特征空间中数据分布的规模无关;因此,它可以比绝对距离信息更稳健。第二,使用基于 K 最近邻的概率密度估计 (KNN-PDE) 类似策略来计算每个训练实例的相对密度。此外,基于相对密度设计样本的模糊隶属度,以消除噪声和离群样本带来的分类误差。最后,提出了两种算法来训练基于加权极限学习机的软件缺陷预测模型。本文在基准数据集上比较了所提出的算法与传统的 SDP 方法。事实证明,所提出的方法在包括 G-mean、AUC 和 Balance 在内的度量方面具有更好的整体性能。
更新日期:2020-11-18
down
wechat
bug