Automated demarcation of requirements in textual specifications: a machine learning-based approach,Empirical Software Engineering

当前位置： X-MOL 学术 › Empir. Software Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automated demarcation of requirements in textual specifications: a machine learning-based approach
Empirical Software Engineering ( IF 4.1 ) Pub Date : 2020-09-13 , DOI: 10.1007/s10664-020-09864-1
Sallam Abualhaija , Chetan Arora , Mehrdad Sabetzadeh , Lionel C. Briand , Michael Traynor

A simple but important task during the analysis of a textual requirements specification is to determine which statements in the specification represent requirements. In principle, by following suitable writing and markup conventions, one can provide an immediate and unequivocal demarcation of requirements at the time a specification is being developed. However, neither the presence nor a fully accurate enforcement of such conventions is guaranteed. The result is that, in many practical situations, analysts end up resorting to after-the-fact reviews for sifting requirements from other material in a requirements specification. This is both tedious and time-consuming. We propose an automated approach for demarcating requirements in free-form requirements specifications. The approach, which is based on machine learning, can be applied to a wide variety of specifications in different domains and with different writing styles. We train and evaluate our approach over an independently labeled dataset comprised of 33 industrial requirements specifications. Over this dataset, our approach yields an average precision of 81.2% and an average recall of 95.7%. Compared to simple baselines that demarcate requirements based on the presence of modal verbs and identifiers, our approach leads to an average gain of 16.4% in precision and 25.5% in recall. We collect and analyze expert feedback on the demarcations produced by our approach for industrial requirements specifications. The results indicate that experts find our approach useful and efficient in practice. We developed a prototype tool, named DemaRQ, in support of our approach. To facilitate replication, we make available to the research community this prototype tool alongside the non-proprietary portion of our training data.

中文翻译：

文本规范中需求的自动划分：一种基于机器学习的方法

在分析文本需求规范期间，一项简单但重要的任务是确定规范中的哪些陈述代表了需求。原则上，通过遵循适当的书写和标记约定，人们可以在开发规范时提供直接和明确的需求划分。但是，不能保证此类约定的存在和完全准确的执行。结果是，在许多实际情况下，分析人员最终求助于事后审查，以从需求规范中的其他材料中筛选需求。这既乏味又耗时。我们提出了一种在自由形式的需求规范中划分需求的自动化方法。该方法基于机器学习，可以应用于不同领域和不同写作风格的各种规范。我们在由 33 个工业需求规范组成的独立标记数据集上训练和评估我们的方法。在这个数据集上，我们的方法产生了 81.2% 的平均精度和 95.7% 的平均召回率。与基于模态动词和标识符的存在划分需求的简单基线相比，我们的方法平均提高了 16.4% 的准确率和 25.5% 的召回率。我们收集和分析专家对我们的工业要求规范方法所产生的分界的反馈。结果表明，专家发现我们的方法在实践中有用且有效。我们开发了一个名为 DemaRQ 的原型工具来支持我们的方法。为了便于复制，

更新日期：2020-09-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>