Analyzing the Stationarity Process in Software Effort Estimation Datasets,International Journal of Software Engineering and Knowledge Engineering

当前位置： X-MOL 学术 › Int. J. Softw. Eng. Knowl. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Analyzing the Stationarity Process in Software Effort Estimation Datasets
International Journal of Software Engineering and Knowledge Engineering ( IF 0.6 ) Pub Date : 2021-01-22 , DOI: 10.1142/s0218194020400239
Michael Franklin Bosu ₁ , Stephen G. MacDonell ₂ , Peter A. Whigham ₂

Affiliation

Software effort estimation models are typically developed based on an underlying assumption that all data points are equally relevant to the prediction of effort for future projects. The dynamic nature of several aspects of the software engineering process could mean that this assumption does not hold in at least some cases. This study employs three kernel estimator functions to test the stationarity assumption in five software engineering datasets that have been used in the construction of software effort estimation models. The kernel estimators are used in the generation of nonuniform weights which are subsequently employed in weighted linear regression modeling. In each model, older projects are assigned smaller weights while the more recently completed projects are assigned larger weights, to reflect their potentially greater relevance to present or future projects that need to be estimated. Prediction errors are compared to those obtained from uniform models. Our results indicate that, for the datasets that exhibit underlying nonstationary processes, uniform models are more accurate than the nonuniform models; that is, models based on kernel estimator functions are worse than the models where no weighting was applied. In contrast, the accuracies of uniform and nonuniform models for datasets that exhibited stationary processes were essentially equivalent. Our analysis indicates that as the heterogeneity of a dataset increases, the effect of stationarity is overridden. The results of our study also confirm prior findings that the accuracy of effort estimation models is independent of the type of kernel estimator function used in model development.

中文翻译：

分析软件工作量估计数据集中的平稳性过程

软件工作量估计模型通常是基于一个基本假设开发的，即所有数据点都与未来项目的工作量预测同等相关。软件工程过程的几个方面的动态特性可能意味着这个假设至少在某些情况下不成立。本研究采用三个内核估计函数来测试五个软件工程数据集中的平稳性假设，这些数据集已用于构建软件工作量估计模型。核估计器用于生成非均匀权重，随后在加权线性回归建模中使用这些权重。在每个模型中，较旧的项目被分配较小的权重，而最近完成的项目被分配较大的权重，以反映它们与需要估计的当前或未来项目的潜在更大相关性。预测误差与从统一模型中获得的误差进行比较。我们的结果表明，对于表现出潜在非平稳过程的数据集，均匀模型比非均匀模型更准确；也就是说，基于核估计函数的模型比未应用加权的模型差。相比之下，对于表现出平稳过程的数据集，均匀和非均匀模型的准确性基本上是相同的。我们的分析表明，随着数据集异质性的增加，平稳性的影响被覆盖。

更新日期：2021-01-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11