A systemic framework for crowdsourced test report quality assessment,Empirical Software Engineering

当前位置： X-MOL 学术 › Empir. Software Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A systemic framework for crowdsourced test report quality assessment
Empirical Software Engineering ( IF 4.1 ) Pub Date : 2020-02-27 , DOI: 10.1007/s10664-019-09793-8
Xin Chen , He Jiang , Xiaochen Li , Liming Nie , Dongjin Yu , Tieke He , Zhenyu Chen

In crowdsourced mobile application testing, crowd workers perform test tasks for developers and submit test reports to report the observed abnormal behaviors. These test reports usually provide important information to improve the quality of software. However, due to the poor expertise of workers and the inconvenience of editing on mobile devices, some test reports usually lack necessary information for understanding and reproducing the revealed bugs. Sometimes developers have to spend a significant part of available resources to handle the low-quality test reports, thus severely reducing the inspection efficiency. In this paper, to help developers determine whether a test report should be selected for inspection within limited resources, we issue a new problem of test report quality assessment. Aiming to model the quality of test reports, we propose a new framework named TERQAF. First, we systematically summarize some desirable properties to characterize expected test reports and define a set of measurable indicators to quantify these properties. Then, we determine the numerical values of indicators according to the contained contents of test reports. Finally, we train a classifier by using logistic regression to predict the quality of test reports. To validate the effectiveness of TERQAF, we conduct extensive experiments over five crowdsourced test report datasets. Experimental results show that TERQAF can achieve 85.18% in terms of Macro-average Precision (MacroP), 75.87% in terms of Macro-average Recall (MacroR), and 80.01% in terms of Macro-average F-measure (MacroF) on average in test report quality assessment. Meanwhile, the empirical results also demonstrate that test report quality assessment can help developers handle test reports more efficiently.

中文翻译：

众包测试报告质量评估的系统框架

在众包移动应用测试中，众包工作者为开发者执行测试任务并提交测试报告以报告观察到的异常行为。这些测试报告通常为提高软件质量提供重要信息。然而，由于工作人员的专业知识较差，并且在移动设备上编辑不方便，一些测试报告通常缺乏必要的信息来理解和重现所揭示的错误。有时开发人员不得不花费很大一部分可用资源来处理低质量的测试报告，从而严重降低了检查效率。在本文中，为了帮助开发人员在有限的资源内确定是否应该选择测试报告进行检查，我们提出了测试报告质量评估的新问题。旨在模拟测试报告的质量，我们提出了一个名为 TERQAF 的新框架。首先，我们系统地总结了一些理想的特性来表征预期的测试报告，并定义一组可衡量的指标来量化这些特性。然后根据检测报告的内容确定指标的数值。最后，我们通过使用逻辑回归来训练分类器来预测测试报告的质量。为了验证 TERQAF 的有效性，我们对五个众包测试报告数据集进行了大量实验。实验结果表明，TERQAF在Macro-average Precision (MacroP)方面可以达到85.18%，在Macro-average Recall (MacroR)方面可以达到75.87%，在Macro-average F-measure (MacroF)方面平均可以达到80.01%在测试报告质量评估中。同时，

更新日期：2020-02-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>