Topic modeling for feature location in software models: Studying both code generation and interpreted models,Information and Software Technology

当前位置： X-MOL 学术 › Inf. Softw. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Topic modeling for feature location in software models: Studying both code generation and interpreted models
Information and Software Technology ( IF 3.8 ) Pub Date : 2021-07-01 , DOI: 10.1016/j.infsof.2021.106676
Francisca Pérez , Raúl Lapeña , Ana C. Marcén , Carlos Cetina

Context:

In the last 20 years, the research community has increased its attention to the use of topic modeling for software maintenance and evolution tasks in code. Topic modeling is a popular and promising information retrieval technique that represents topics by word probabilities. Latent Dirichlet Allocation (LDA) is one of the most popular topic modeling methods. However, the use of topic modeling in model-driven software development has been largely neglected. Since software models have less noise (implementation details) than software code, software models might be well-suited for topic modeling.

Objective:

This paper presents our LDA-guided evolutionary approach for feature location in software models. Specifically, we consider two types of software models: models for code generation and interpreted model.

Method:

We evaluate our approach considering two real-world industrial case studies: code-generation models for train control software, and interpreted models for a commercial video game. To study the impact on the results, we compare our approach for feature location in models against random search and a baseline based on Latent Semantic Indexing, which is a popular information retrieval technique. In addition, we perform a statistical analysis of the results to show that this impact is significant. We also discuss the results in terms of the following aspects: data sparsity, implementation complexity, calibration, and stability.

Results:

Our approach significantly outperforms the baseline in terms of recall, precision and F-measure when it comes to interpreted models. This is not the case for code-generation models.

Conclusions:

Our analysis of the results uncovers a recommendation towards results improvement. We also show that calibration approaches can be transferred from code to models. The findings of our work with regards to the compensation of instability have the potential to help not only feature location in models, but also in code.

中文翻译：

软件模型中特征位置的主题建模：研究代码生成和解释模型

语境：

在过去的 20 年中，研究界越来越关注将主题建模用于代码中的软件维护和演化任务。主题建模是一种流行且有前途的信息检索技术，它通过单词概率表示主题。潜在狄利克雷分配（LDA）是最流行的主题建模方法之一。然而，在模型驱动的软件开发中使用主题建模在很大程度上被忽视了。由于软件模型的噪声（实现细节）比软件代码少，因此软件模型可能非常适合主题建模。

客观的：

本文介绍了我们用于软件模型中特征定位的 LDA 引导的进化方法。具体来说，我们考虑两种类型的软件模型：代码生成模型和解释模型。

方法：

我们考虑两个真实的工业案例研究来评估我们的方法：列车控制软件的代码生成模型和商业视频游戏的解释模型。为了研究对结果的影响，我们将模型中特征定位的方法与随机搜索和基于潜在语义索引的基线进行了比较，这是一种流行的信息检索技术。此外，我们对结果进行了统计分析，以表明这种影响是显着的。我们还从以下几个方面讨论了结果：数据稀疏性、实现复杂性、校准和稳定性。