当前位置: X-MOL 学术Softw. Pract. Exp. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Software reuse analytics using integrated random forest and gradient boosting machine learning algorithm
Software: Practice and Experience ( IF 3.5 ) Pub Date : 2020-10-27 , DOI: 10.1002/spe.2921
Amandeep Kaur Sandhu 1 , Ranbir Singh Batth 1
Affiliation  

The term Cleaner Production (CP) for Production Companies is contemplated as influential to get sustainable production. CP mainly deals with three R's that is, reuse, reduce, and recycle. For software enterprise, the software reuse plays a pivotal role. Software reuse is a process of producing new products or software from the existing software by updating it. To extract useful information from the existing software data mining comes into light. The algorithms used for software reuse face issues related to maintenance cost, accuracy, and performance. Also, the currently used algorithm does not give accurate results on whether the component of software can be reused. Machine Learning gives the best results to predicate if the given software component is reusable or not. This paper introduces an integrated Random Forest and Gradient Boosting Machine Learning Algorithm (RFGBM) which test the reusability of the given software code considering the object‐oriented parameters such as cohesion, coupling, cyclomatic complexity, bugs, number of children, and depth inheritance tree. Further, the proposed algorithm is compared with J48, AdaBoostM1, LogitBoost, Part, One R, LMT, JRip, DecisionStump algorithms. Performance metrices like accuracy, error rate, Relative Absolute Error, and Mean Absolute Error are improved using RFGBM. This algorithm also utilizes data preprocessing with the help of an unsupervised filter to remove the missing value for efficiency improvement. Proposed algorithm outperforms existing in term of performance parameters.

中文翻译:

使用集成的随机森林和梯度提升机器学习算法进行软件重用分析

生产公司的清洁生产(CP)一词被认为对实现可持续生产具有影响力。CP主要处理三个R,即重用,减少和回收。对于软件企业而言,软件重用起着举足轻重的作用。软件重用是通过更新现有软件来生产新产品或软件的过程。为了从现有软件中提取有用信息,数据挖掘应运而生。用于软件重用的算法面临与维护成本,准确性和性能有关的问题。另外,当前使用的算法无法给出关于软件组件是否可以重用的准确结果。机器学习可提供最佳结果,以判断给定的软件组件是否可重用。本文介绍了一种集成的随机森林和梯度提升机器学习算法(RFGBM),该算法考虑了面向对象的参数(如内聚性,耦合,循环复杂度,错误,子代数和深度继承树)来测试给定软件代码的可重用性。 。此外,将该算法与J48,AdaBoostM1,LogitBoost,Part,One R,LMT,JRip,DecisionStump算法进行了比较。使用RFGBM可改善性能指标,如准确性,错误率,相对绝对误差和平均绝对误差。该算法还利用无监督过滤器的数据预处理来去除缺失值,从而提高效率。提出的算法在性能参数方面优于现有算法。
更新日期:2020-10-27
down
wechat
bug