当前位置: X-MOL 学术Softw. Syst. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enhancing software model encoding for feature location approaches based on machine learning techniques
Software and Systems Modeling ( IF 2.0 ) Pub Date : 2021-08-23 , DOI: 10.1007/s10270-021-00920-y
Ana C. Marcén 1, 2 , Francisca Pérez 1 , Carlos Cetina 1 , Óscar Pastor 2
Affiliation  

Feature location is one of the main activities performed during software evolution. In our previous works, we proposed an approach for feature location in models based on machine learning, providing evidence that machine learning techniques can obtain better results than other retrieval techniques for feature location in models. However, to apply machine learning techniques optimally, the design of an encoding is essential to be able to identify the best realization of a feature. In this work, we present more thorough research about software model encoding for feature location approaches based on machine learning. As part of this study, we have provided two new software model encodings and compared them with the source encoding. The first proposed encoding is an extension of the source encoding to take advantage of not only the main concepts and relations of a domain but also the properties of these concepts and relations. The second proposed encoding is inspired by the characteristics used in benchmark datasets for research on Learning to Rank. Afterward, the new encodings are used to compare three different machine learning techniques (RankBoost, Feedforward Neural Network, and Recurrent Neural Network). The study also considers whether a domain-independent encoding such as the ones proposed in this work can outperform an encoding that is specifically designed to exploit human experience and domain knowledge. Furthermore, the results of the best encoding and the best machine learning technique were compared to two traditional approaches that have been widely applied for feature location as well as for traceability link recovery and bug localization. The evaluation is based on two real-world case studies, one in the railway domain and the other in the induction hob domain. An approach for feature location in models evaluates these case studies with the different encodings and machine learning techniques. The results show that when using the second proposed encoding and RankBoost, the approach outperforms the results of the other encodings and machine learning techniques and the results of the traditional approaches. Specifically, the approach achieved the best results for all the performance indicators, providing a mean precision value of 90.11%, a recall value of 86.20%, a F-measure value of 87.22%, and a MCC value of 0.87. The statistical analysis of the results shows that this approach significantly improves the results and increases the magnitude of the improvement. The promising results of this work can serve as a starting point toward the use of machine learning techniques in other engineering tasks with software models, such as traceability or bug location.



中文翻译:

基于机器学习技术的特征定位方法增强软件模型编码

功能定位是软件演化过程中执行的主要活动之一。在我们之前的工作中,我们提出了一种基于机器学习的模型特征定位方法,证明机器学习技术在模型特征定位方面可以获得比其他检索技术更好的结果。然而,为了以最佳方式应用机器学习技术,编码设计对于识别特征的最佳实现至关重要。在这项工作中,我们对基于机器学习的特征定位方法的软件模型编码进行了更深入的研究。作为这项研究的一部分,我们提供了两种新的软件模型编码,并将它们与源编码进行了比较。第一个提议的编码是源编码的扩展,不仅利用域的主要概念和关系,而且利用这些概念和关系的属性。第二种提议的编码受到基准数据集中用于学习排名的研究的特征的启发。之后,新的编码用于比较三种不同的机器学习技术(RankBoost、前馈神经网络和循环神经网络)。该研究还考虑了诸如本工作中提出的与领域无关的编码是否可以胜过专门设计用于利用人类经验和领域知识的编码。此外,将最佳编码和最佳机器学习技术的结果与两种广泛应用于特征定位以及可追溯性链接恢复和错误定位的传统方法进行了比较。评估基于两个真实案例研究,一个在铁路领域,另一个在电磁炉领域。模型中的特征定位方法使用不同的编码和机器学习技术评估这些案例研究。结果表明,当使用第二种提出的编码和 RankBoost 时,该方法优于其他编码和机器学习技术的结果以及传统方法的结果。具体来说,该方法在所有性能指标上都取得了最好的结果,提供了 90.11% 的平均精度值,召回值为 86.20%,F-measure 值为 87.22%,MCC 值为 0.87。结果的统计分析表明,这种方法显着改善了结果并增加了改善的幅度。这项工作的有希望的结果可以作为在其他具有软件模型的工程任务中使用机器学习技术的起点,例如可追溯性或错误定位。

更新日期:2021-08-24
down
wechat
bug