Practical Accuracy Estimation for Efficient Deep Neural Network Testing,ACM Transactions on Software Engineering and Methodology

当前位置： X-MOL 学术 › ACM Trans. Softw. Eng. Methodol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Practical Accuracy Estimation for Efficient Deep Neural Network Testing
ACM Transactions on Software Engineering and Methodology ( IF 6.6 ) Pub Date : 2020-07-07 , DOI: 10.1145/3394112
Junjie Chen ₁ , Zhuo Wu ₂ , Zan Wang ₁ , Hanmo You ₁ , Lingming Zhang ₃ , Ming Yan ₁

Affiliation

Deep neural network (DNN) has become increasingly popular and DNN testing is very critical to guarantee the correctness of DNN, i.e., the accuracy of DNN in this work. However, DNN testing suffers from a serious efficiency problem, i.e., it is costly to label each test input to know the DNN accuracy for the testing set, since labeling each test input involves multiple persons (even with domain-specific knowledge) in a manual way and the testing set is large-scale. To relieve this problem, we propose a novel and practical approach, called PACE (which is short for P ractical AC curacy E stimation), which selects a small set of test inputs that can precisely estimate the accuracy of the whole testing set. In this way, the labeling costs can be largely reduced by just labeling this small set of selected test inputs. Besides achieving a precise accuracy estimation, to make PACE more practical it is also required that it is interpretable, deterministic, and as efficient as possible. Therefore, PACE first incorporates clustering to interpretably divide test inputs with different testing capabilities (i.e., testing different functionalities of a DNN model) into different groups. Then, PACE utilizes the MMD-critic algorithm, a state-of-the-art example-based explanation algorithm, to select prototypes (i.e., the most representative test inputs) from each group, according to the group sizes, which can reduce the impact of noise due to clustering. Meanwhile, PACE also borrows the idea of adaptive random testing to select test inputs from the minority space (i.e., the test inputs that are not clustered into any group) to achieve great diversity under the required number of test inputs. The two parallel selection processes (i.e., selection from both groups and the minority space) compose the final small set of selected test inputs. We conducted an extensive study to evaluate the performance of PACE based on a comprehensive benchmark (i.e., 24 pairs of DNN models and testing sets) by considering different types of models (i.e., classification and regression models, high-accuracy and low-accuracy models, and CNN and RNN models) and different types of test inputs (i.e., original, mutated, and automatically generated test inputs). The results demonstrate that PACE is able to precisely estimate the accuracy of the whole testing set with only 1.181%∼2.302% deviations, on average, significantly outperforming the state-of-the-art approaches.

中文翻译：

高效深度神经网络测试的实用准确度估计

深度神经网络 (DNN) 变得越来越流行，DNN 测试对于保证 DNN 的正确性，即 DNN 在这项工作中的准确性非常关键。然而，DNN 测试存在严重的效率问题，即标记每个测试输入以了解测试集的 DNN 准确性的成本很高，因为在手册中标记每个测试输入涉及多个人（即使具有特定领域的知识）方式和测试集是大规模的。为了缓解这个问题，我们提出了一种新颖实用的方法，称为 PACE（简称磷实用的交流电精准度乙stimation），它选择一小组测试输入，可以精确估计整个测试集的准确性。通过这种方式，只需标记一小部分选定的测试输入，就可以大大降低标记成本。除了实现精确的准确度估计外，为了使 PACE 更实用，还需要它是可解释的、确定性的和尽可能高效的。因此，PACE 首先结合聚类，将具有不同测试能力（即测试 DNN 模型的不同功能）的测试输入可解释地划分为不同的组。然后，PACE 利用 MMD-critic 算法，一种最先进的基于示例的解释算法，根据组大小从每个组中选择原型（即最具代表性的测试输入），这可以减少由于聚类引起的噪声的影响。同时，PACE 还借用了自适应随机测试的思想，从少数空间（即没有聚类到任何组中的测试输入）中选择测试输入，以在所需的测试输入数量下实现极大的多样性。两个并行的选择过程（即，从两个组和少数空间中选择）组成了最后一小组选定的测试输入。我们进行了一项广泛的研究，通过考虑不同类型的模型（即分类和回归模型、高精度和低精度模型）基于综合基准（即 24 对 DNN 模型和测试集）来评估 PACE 的性能，以及 CNN 和 RNN 模型）和不同类型的测试输入（即原始、变异和自动生成的测试输入）。

更新日期：2020-07-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11