Inter-release defect prediction with feature selection using temporal chunk-based learning: An empirical study,Applied Soft Computing

当前位置： X-MOL 学术 › Appl. Soft Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Inter-release defect prediction with feature selection using temporal chunk-based learning: An empirical study
Applied Soft Computing ( IF 7.2 ) Pub Date : 2021-09-09 , DOI: 10.1016/j.asoc.2021.107870
Md Alamgir Kabir ₁ , Jacky Keung ₁ , Burak Turhan _{2,

3} , Kwabena Ebo Bennin ₄

Affiliation

Inter-release defect prediction (IRDP) is a practical scenario that employs the datasets of the previous release to build a prediction model and predicts defects for the current release within the same software project. A practical software project experiences several releases where data of each release appears in the form of chunks that arrive in temporal order. The evolving data of each release introduces new concept to the model known as concept drift, which negatively impacts the performance of IRDP models. In this study, we aim to examine and assess the impact of feature selection (FS) on the performance of IRDP models and the robustness of the model to concept drift. We conduct empirical experiments using 36 releases of 10 open-source projects. The Friedman and Nemenyi Post-hoc test results indicate that there were statistical differences between the prediction results with and without FS techniques. IRDP models trained on the data of most recent releases were not always the best models. Furthermore, the prediction models trained with carefully selected features could help reduce concept drifts.

中文翻译：

使用基于时间块的学习进行特征选择的发布间缺陷预测：一项实证研究

版本间缺陷预测 (IRDP) 是一种实际场景，它使用先前版本的数据集来构建预测模型并预测同一软件项目中当前版本的缺陷。一个实际的软件项目会经历多个版本，其中每个版本的数据以按时间顺序到达的块的形式出现。每个版本的不断发展的数据都为模型引入了新概念，称为概念漂移，这会对 IRDP 模型的性能产生负面影响。在本研究中，我们旨在检查和评估特征选择 (FS) 对 IRDP 模型性能的影响以及模型对概念漂移的鲁棒性。我们使用 10 个开源项目的 36 个版本进行实证实验。Friedman 和 Nemenyi Post-hoc 检验结果表明使用和不使用 FS 技术的预测结果之间存在统计差异。根据最新版本的数据训练的 IRDP 模型并不总是最好的模型。此外，使用精心挑选的特征训练的预测模型可以帮助减少概念漂移。

更新日期：2021-09-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11