An empirical study on predictability of software maintainability using imbalanced data,Software Quality Journal

当前位置： X-MOL 学术 › Software Qual. J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An empirical study on predictability of software maintainability using imbalanced data
Software Quality Journal ( IF 1.7 ) Pub Date : 2020-08-05 , DOI: 10.1007/s11219-020-09525-y
Ruchika Malhotra , Kusum Lata

In software engineering predictive modeling, early prediction of software modules or classes that possess high maintainability effort is a challenging task. Many prediction models are constructed to predict the maintainability of software classes or modules by applying various machine learning (ML) techniques. If the software modules or classes need high maintainability, effort would be reduced in a dataset, and there would be imbalanced data to train the model. The imbalanced datasets make ML techniques bias their predictions towards low maintainability effort or majority classes, and minority class instances get discarded as noise by the machine learning (ML) techniques. In this direction, this paper presents empirical work to improve the performance of software maintainability prediction (SMP) models developed with ML techniques using imbalanced data. For developing the models, the imbalanced data is pre-processed by applying data resampling methods. Fourteen data resampling methods, including oversampling, undersampling, and hybrid resampling, are used in the study. The study results recommend that the safe-level synthetic minority oversampling technique (Safe-Level-SMOTE) is a useful method to deal with the imbalanced datasets and to develop competent prediction models to forecast software maintainability.

中文翻译：

基于不平衡数据的软件可维护性可预测性实证研究

在软件工程预测建模中，对具有高可维护性工作的软件模块或类进行早期预测是一项具有挑战性的任务。通过应用各种机器学习 (ML) 技术，构建了许多预测模型来预测软件类或模块的可维护性。如果软件模块或类需要高可维护性，则在数据集中会减少工作量，并且会出现不平衡的数据来训练模型。不平衡的数据集使 ML 技术将其预测偏向于低可维护性工作或多数类，并且少数类实例被机器学习 (ML) 技术作为噪声丢弃。在这个方向，本文介绍了使用不平衡数据通过 ML 技术开发的软件可维护性预测 (SMP) 模型性能的实证工作。为了开发模型，通过应用数据重采样方法对不平衡数据进行预处理。研究中使用了 14 种数据重采样方法，包括过采样、欠采样和混合重采样。研究结果表明，安全级别的合成少数过采样技术（Safe-Level-SMOTE）是处理不平衡数据集和开发有能力的预测模型来预测软件可维护性的有用方法。

更新日期：2020-08-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11