Boundary Sampling to Boost Mutation Testing for Deep Learning Models,Information and Software Technology

当前位置： X-MOL 学术 › Inf. Softw. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Boundary Sampling to Boost Mutation Testing for Deep Learning Models
Information and Software Technology ( IF 3.9 ) Pub Date : 2020-09-25 , DOI: 10.1016/j.infsof.2020.106413
Weijun Shen , Yanhui Li , Yuanlei Han , Lin Chen , Di Wu , Yuming Zhou , Baowen Xu

Context: The prevalent application of Deep Learning (DL) models has raised concerns about their reliability. Due to the data-driven programming paradigm, the quality of test datasets is extremely important to gain accurate assessment of DL models. Recently, researchers have introduced mutation testing into DL testing, which applies mutation operators to generate mutants from DL models, and observes whether the test data can identify mutants to check the quality of test dataset. However, there still exist many factors (e.g., huge labeling efforts and high running cost) hindering the implementation of mutation testing for DL models.

Objective: We desire for an approach to selecting a smaller, sensitive, representative and efficient subset of the whole test dataset to promote the current mutation testing (e.g., reduce labeling and running cost) for DL Models.

Method: We propose boundary sample selection (BSS), which employs the distance of samples to decision boundary of DL models as the indicator to construct the appropriate subset. To evaluate the performance of BSS, we conduct an extensive empirical study with two widely-used datasets, three popular DL models, and 14 up-to-date DL mutation operators.

: We observe that (1) The sizes of our subsets generated by BSS are much smaller (about 3%-20% of the whole test set). (2) Under most mutation operators, our subsets are superior (about 9.94-21.63) than the whole test sets in observing mutation effects. (3) Our subsets could replace the whole test sets to a very high degree (higher than 97%) when considering mutation score. (4) The MRR values of our proposed subsets are clearly better (about 2.28-13.19 times higher) than that of the whole test sets.

Conclusions: The result shows that BSS can help testers save labelling cost, run mutation testing quickly and identify killed mutants early.

中文翻译：

边界采样可促进深度学习模型的变异测试

背景：深度学习（DL）模型的普遍应用引起了人们对其可靠性的关注。由于数据驱动的编程范例，测试数据集的质量对于获得DL模型的准确评估极为重要。最近，研究人员将突变测试引入了DL测试，该测试使用突变算子从DL模型生成突变体，并观察测试数据是否可以识别突变体以检查测试数据集的质量。但是，仍然存在许多因素（例如，巨大的标签工作量和较高的运行成本）阻碍了对DL模型的突变测试的实施。

目的：我们希望有一种方法可以从整个测试数据集中选择一个较小，敏感，代表性和高效的子集，以促进DL模型的当前突变测试（例如，减少标记和运行成本）。

方法：我们提出边界样本选择（BSS），该方法利用样本到DL模型决策边界的距离作为指标来构建适当的子集。为了评估BSS的性能，我们使用两个广泛使用的数据集，三个流行的DL模型和14个最新的DL突变算子进行了广泛的经验研究。

：我们观察到（1）BSS生成的子集的大小要小得多（约占整个测试集的3％-20％）。（2）在大多数变异算子下，我们的子集在观察变异效应方面要优于整个测试集（约9.94-21.63）。（3）当考虑突变得分时，我们的子集可以非常高度地替换整个测试集（高于97％）。（4）我们提出的子集的MRR值明显好于整个测试集（约2.28-13.19倍）。

结论：结果表明，BSS可以帮助测试人员节省标记成本，快速进行突变测试并及早发现被杀死的突变体。

更新日期：2020-09-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南