当前位置: X-MOL 学术arXiv.cs.SE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
To Automatically Map Source Code Entities to Architectural Modules with Naive Bayes
arXiv - CS - Software Engineering Pub Date : 2021-09-20 , DOI: arxiv-2109.09525
Tobias Olsson, Morgan Ericsson, Anna Wingkvist

Background: The process of mapping a source code entity onto an architectural module is to a large degree a manual task. Automating this process could increase the use of static architecture conformance checking methods, such as reflexion modeling, in industry. Current techniques rely on user parameterization and a highly cohesive design. A machine learning approach would potentially require fewer parameters and better use of the available information to aid in automatic mapping. Aim: We investigate how a classifier can be trained to map from source code to architecture modules automatically. This classifier is trained with semantic and syntactic dependency information extracted from the source code and from architecture descriptions. The classifier is implemented using multinomial naive Bayes and evaluated. Method: We perform experiments and compare the classifier with three state-of-the-art mapping functions in eight open-source Java systems with known ground-truth-mappings. Results: We find that the classifier outperforms the state-of-the-art in all cases and that it provides a useful baseline for further research in the area of semi-automatic incremental clustering. Conclusions: We conclude that machine learning is a useful approach that performs better and with less need for parameterization compared to other approaches. Future work includes investigating problematic mappings and a more diverse set of subject systems.

中文翻译:

使用朴素贝叶斯自动将源代码实体映射到架构模块

背景:将源代码实体映射到架构模块的过程在很大程度上是一项手动任务。自动化这个过程可以增加静态架构一致性检查方法的使用,例如反射建模,在工业中。当前的技术依赖于用户参数化和高度内聚的设计。机器学习方法可能需要更少的参数和更好地利用可用信息来帮助自动映射。目标:我们研究如何训练分类器以自动从源代码映射到架构模块。该分类器使用从源代码和架构描述中提取的语义和句法依赖信息进行训练。分类器使用多项式朴素贝叶斯实现并进行评估。方法:我们进行了实验,并将分类器与八个具有已知真实映射的开源 Java 系统中的三个最先进的映射函数进行了比较。结果:我们发现分类器在所有情况下都优于最先进的分类器,并且它为半自动增量聚类领域的进一步研究提供了有用的基线。结论:我们得出结论,机器学习是一种有用的方法,与其他方法相比,它的性能更好,并且对参数化的需求更少。未来的工作包括调查有问题的映射和更多样化的主题系统。我们发现分类器在所有情况下都优于最先进的分类器,并且它为半自动增量聚类领域的进一步研究提供了有用的基线。结论:我们得出结论,机器学习是一种有用的方法,与其他方法相比,它的性能更好,并且对参数化的需求更少。未来的工作包括调查有问题的映射和更多样化的主题系统。我们发现分类器在所有情况下都优于最先进的分类器,并且它为半自动增量聚类领域的进一步研究提供了有用的基线。结论:我们得出结论,机器学习是一种有用的方法,与其他方法相比,它的性能更好,并且对参数化的需求更少。未来的工作包括调查有问题的映射和更多样化的主题系统。
更新日期:2021-09-21
down
wechat
bug