当前位置: X-MOL 学术Concurr. Comput. Pract. Exp. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mining software architecture knowledge: Classifying stack overflow posts using machine learning
Concurrency and Computation: Practice and Experience ( IF 2 ) Pub Date : 2021-03-31 , DOI: 10.1002/cpe.6277
Mubashir Ali 1 , Husnain Mushtaq 2 , Muhammad B Rasheed 3, 4 , Anees Baqir 5 , Thamer Alquthami 6
Affiliation  

Software Architectural Process (SAP) is a core and excessively knowledge intensive phase of software development life cycle, as it consumes and produces knowledge artifacts, simultaneously. SAP is about making design decisions, and the changes in these verdicts may pose adverse effects on software projects. The performance and properties of software components are fundamentally influenced by the design decisions. The implementation of immature and abrupt design decisions seriously threatens the development process of SAP. Moreover, software architectural knowledge management (AKM) approaches offer systematic ways to support SAP through versatile architectural solutions and design decisions. However, the majority of software organizations have limited access to data and still depend upon manually created and maintained AKM process. In this paper, we have utilized the one of the most prominent online community for software development (i.e., Stack Overflow) as a source of SAP knowledge to support AKM. In order to support AKM, we have proposed a supervised machine learning-based approach to classify the architectural knowledge into predefined categories, that is, analysis, synthesis, evaluation, and implementation. We have employed different combinations of feature selection technique to achieve the optimal classification results of the used classifiers (Support Vector Machine [SVM], K-Nearest Neighbor, Random Forest, and Naive Bayes [NB]). Among these classifiers, SVM with Uni-gram feature set provides best classification results and attains 85.80% accuracy. For evaluating the proposed approach's effectiveness, we have also computed the suitability of the classifiers, that is, the cost of computation along with its accuracy, and NB with Uni-gram feature set proved to be the most suitable.

中文翻译:

挖掘软件架构知识:使用机器学习对堆栈溢出帖子进行分类

软件架构过程 (SAP) 是软件开发生命周期的核心和过度知识密集型阶段,因为它同时消耗和产生知识工件。SAP 是关于制定设计决策的,这些决策的变化可能会对软件项目造成不利影响。软件组件的性能和属性从根本上受到设计决策的影响。不成熟和突然的设计决策的实施严重威胁到SAP的开发过程。此外,软件架构知识管理 (AKM) 方法提供了通过通用架构解决方案和设计决策支持 SAP 的系统方法。但是,大多数软件组织对数据的访问权限有限,并且仍然依赖于手动创建和维护的 AKM 流程。在本文中,我们利用最著名的软件开发在线社区之一(即 Stack Overflow)作为 SAP 知识的来源来支持 AKM。为了支持 AKM,我们提出了一种基于监督机器学习的方法,将架构知识分为预定义的类别,即分析、综合、评估和实现。我们采用了不同的特征选择技术组合来实现所用分类器(支持向量机 [SVM]、K-最近邻、随机森林和朴素贝叶斯 [NB])的最佳分类结果。在这些分类器中,具有 Uni-gram 特征集的 SVM 提供了最好的分类结果并达到了 85.80% 的准确率。为了评估所提出方法的有效性,我们还计算了分类器的适用性,即
更新日期:2021-03-31
down
wechat
bug