Software defect prediction with imbalanced distribution by radius-synthetic minority over-sampling technique,Journal of Software: Evolution and Process

当前位置： X-MOL 学术 › J. Softw. Evol. Process › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Software defect prediction with imbalanced distribution by radius-synthetic minority over-sampling technique
Journal of Software: Evolution and Process ( IF 1.7 ) Pub Date : 2021-06-05 , DOI: 10.1002/smr.2362
Shikai Guo _{1,

2} , Jian Dong ₂ , Hui Li ₁ , Jiahui Wang ₁

Affiliation

Software defect prediction, which can identify the defect-prone modules, is an effective technology to ensure the quality of software products. Due to the importance in software maintenance, many learning-based software defect prediction models are presented in recent years. Actually, the defects usually occupy a very small proportions in software source codes; thus, the imbalanced distributions between defect-prone modules and non-defect-prone modules increase the learning difficulty of the classification task. To address this issue, we present a random over-sampling mechanism used to generate minority-class samples from high-dimensional sampling space to deal with the imbalanced distributions in software defect prediction, in which two constraints are applied to provide a robust way to generate new synthetic samples, that is, scaling the random over-sampling scope to a reasonable area and distinguishing the majority-class samples in a critical region. Based on nine open datasets of software projects, we experimentally verify that our presented method is effective on predict the defect-prone modules, and the effect is superior to the traditional imbalanced processing methods.

中文翻译：

基于半径合成少数过采样技术的不平衡分布软件缺陷预测

软件缺陷预测可以识别出容易出现缺陷的模块，是保证软件产品质量的有效技术。由于在软件维护中的重要性，近年来提出了许多基于学习的软件缺陷预测模型。实际上，缺陷通常在软件源代码中所占的比例很小；因此，缺陷易发模块和非缺陷易发模块之间的不平衡分布增加了分类任务的学习难度。为了解决这个问题，我们提出了一种随机过采样机制，用于从高维采样空间生成少数类样本，以处理软件缺陷预测中的不平衡分布，其中应用了两个约束来提供一种鲁棒的方式来生成新的合成样本，即将随机过采样范围缩放到合理区域，区分关键区域内的多数类样本。基于九个软件项目的开放数据集，我们通过实验验证了我们提出的方法在预测易缺陷模块方面是有效的，并且效果优于传统的不平衡处理方法。

更新日期：2021-07-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文