Investigating Tree Family Machine Learning Techniques for a Predictive System to Unveil Software Defects,Complexity

当前位置： X-MOL 学术 › Complexity › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Investigating Tree Family Machine Learning Techniques for a Predictive System to Unveil Software Defects
Complexity ( IF 1.7 ) Pub Date : 2020-11-30 , DOI: 10.1155/2020/6688075
Rashid Naseem ₁ , Bilal Khan ₂ , Arshad Ahmad ₁ , Ahmad Almogren ₃ , Saima Jabeen ₁ , Bashir Hayat ₄ , Muhammad Arif Shah ₁

Affiliation

Software defects prediction at the initial period of the software development life cycle remains a critical and important assignment. Defect prediction and correctness leads to the assurance of the quality of software systems and has remained integral to study in the previous years. The quick forecast of imperfect or defective modules in software development can serve the development squad to use the existing assets competently and effectively to provide remarkable software products in a given short timeline. Hitherto, several researchers have industrialized defect prediction models by utilizing statistical and machine learning techniques that are operative and effective approaches to pinpoint the defective modules. Tree family machine learning techniques are well-thought-out to be one of the finest and ordinarily used supervised learning methods. In this study, different tree family machine learning techniques are employed for software defect prediction using ten benchmark datasets. These techniques include Credal Decision Tree (CDT), Cost-Sensitive Decision Forest (CS-Forest), Decision Stump (DS), Forest by Penalizing Attributes (Forest-PA), Hoeffding Tree (HT), Decision Tree (J48), Logistic Model Tree (LMT), Random Forest (RF), Random Tree (RT), and REP-Tree (REP-T). Performance of each technique is evaluated using different measures, i.e., mean absolute error (MAE), relative absolute error (RAE), root mean squared error (RMSE), root relative squared error (RRSE), specificity, precision, recall, F-measure (FM), G-measure (GM), Matthew’s correlation coefficient (MCC), and accuracy. The overall outcomes of this paper suggested RF technique by producing best results in terms of reducing error rates as well as increasing accuracy on five datasets, i.e., AR3, PC1, PC2, PC3, and PC4. The average accuracy achieved by RF is 90.2238%. The comprehensive outcomes of this study can be used as a reference point for other researchers. Any assertion concerning the enhancement in prediction through any new model, technique, or framework can be benchmarked and verified.

中文翻译：

研究用于预测系统以揭示软件缺陷的树族机器学习技术

在软件开发生命周期的初始阶段进行软件缺陷预测仍然是至关重要的任务。缺陷的预测和正确性可确保软件系统的质量，并且在过去的几年中仍然是研究的组成部分。对软件开发中不完善或有缺陷的模块的快速预测可以为开发团队提供服务，使其能够在给定的短时间内有效，有效地使用现有资产，以提供出色的软件产品。迄今为止，一些研究人员已经通过利用统计和机器学习技术对缺陷预测模型进行了工业化，这些技术是可操作和有效的方法来确定缺陷模块。树族机器学习技术是经过深思熟虑的一种，是最好的，通常使用的监督学习方法之一。在这项研究中，使用十个基准数据集，将不同的树族机器学习技术用于软件缺陷预测。这些技术包括：破坏决策树（CDT），成本敏感决策森林（CS-Forest），决策树桩（DS），惩罚属性森林（Forest-PA），霍夫丁树（HT），决策树（J48），物流模型树（LMT），随机森林（RF），随机树（RT）和REP树（REP-T）。每种技术的性能都使用不同的度量进行评估，即平均绝对误差（MAE），相对绝对误差（RAE），均方根误差（RMSE），均方根误差（RRSE），特异性，精度，召回率，F-量度（FM），G量度（GM），马修相关系数（MCC）和准确性。本文的总体结果提出了RF技术，它通过降低错误率以及提高五个数据集（即AR3，PC1，PC2，PC3和PC4）的准确性来产生最佳结果。RF实现的平均精度为90.2238％。这项研究的综合成果可以用作其他研究者的参考点。关于通过任何新模型，新技术或新框架进行的预测增强的任何主张都可以进行基准测试和验证。

更新日期：2020-12-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11