当前位置: X-MOL 学术arXiv.cs.SE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning to Encode and Classify Test Executions
arXiv - CS - Software Engineering Pub Date : 2020-01-08 , DOI: arxiv-2001.02444
Foivos Tsimpourlas, Ajitha Rajan, Miltiadis Allamanis

The challenge of automatically determining the correctness of test executions is referred to as the test oracle problem and is one of the key remaining issues for automated testing. The goal in this paper is to solve the test oracle problem in a way that is general, scalable and accurate. To achieve this, we use supervised learning over test execution traces. We label a small fraction of the execution traces with their verdict of pass or fail. We use the labelled traces to train a neural network (NN) model to learn to distinguish runtime patterns for passing versus failing executions for a given program. Our approach for building this NN model involves the following steps, 1. Instrument the program to record execution traces as sequences of method invocations and global state, 2. Label a small fraction of the execution traces with their verdicts, 3. Designing a NN component that embeds information in execution traces to fixed length vectors, 4. Design a NN model that uses the trace information for classification, 5. Evaluate the inferred classification model on unseen execution traces from the program. We evaluate our approach using case studies from different application domains: 1. Module from Ethereum Blockchain, 2. Module from PyTorch deep learning framework, 3. Microsoft SEAL encryption library components, 4. Sed stream editor, 5. Value pointer library and 6. Nine network protocols from Linux packet identifier, L7-Filter. We found the classification models for all subject programs resulted in high precision, recall and specificity, over 95%, while only training with an average 9% of the total traces. Our experiments show that the proposed neural network model is highly effective as a test oracle and is able to learn runtime patterns to distinguish passing and failing test executions for systems and tests from different application domains.

中文翻译:

学习编码和分类测试执行

自动确定测试执行正确性的挑战被称为测试预言机问题,并且是自动化测试的关键遗留问题之一。本文的目标是以通用、可扩展和准确的方式解决测试预言机问题。为了实现这一点,我们对测试执行跟踪使用监督学习。我们用他们对通过或失败的判断来标记一小部分执行跟踪。我们使用标记的轨迹来训练神经网络 (NN) 模型,以学习区分给定程序的通过和失败执行的运行时模式。我们构建这个神经网络模型的方法包括以下步骤,1. 检测程序以将执行跟踪记录为方法调用和全局状态的序列,2. 用他们的判断标记一小部分执行跟踪,3. 设计一个 NN 组件,将执行轨迹中的信息嵌入到固定长度的向量中, 4. 设计一个使用轨迹信息进行分类的 NN 模型, 5. 根据程序中看不见的执行轨迹评估推断分类模型。我们使用来自不同应用领域的案例研究来评估我们的方法:1. 来自以太坊区块链的模块,2. 来自 PyTorch 深度学习框架的模块,3. Microsoft SEAL 加密库组件,4. Sed 流编辑器,5. 值指针库和 6.来自 Linux 数据包标识符 L7-Filter 的九个网络协议。我们发现所有学科程序的分类模型都产生了超过 95% 的高精度、召回率和特异性,而仅用平均 9% 的总痕迹进行训练。
更新日期:2020-01-09
down
wechat
bug