Towards effective metamorphic testing by algorithm stability for linear classification programs,Journal of Systems and Software

当前位置： X-MOL 学术 › J. Syst. Softw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards effective metamorphic testing by algorithm stability for linear classification programs
Journal of Systems and Software ( IF 3.7 ) Pub Date : 2021-05-29 , DOI: 10.1016/j.jss.2021.111012
Yingzhuo Yang , Zenan Li , Huiyan Wang , Chang Xu , Xiaoxing Ma

The quality assurance for machine learning systems is becoming increasingly critical nowadays. While many efforts have been paid on trained models from such systems, we focus on the quality of these systems themselves, as the latter essentially decides the quality of numerous models thus trained. In this article, we focus particularly on detecting bugs in implementing one class of model-training systems, namely, linear classification algorithms, which are known to be challenging due to the lack of test oracle. Existing work has attempted to use metamorphic testing to alleviate the oracle problem, but fallen short on overlooking the statistical nature of such learning algorithms, leading to premature metamorphic relations (MRs) suffering efficacy and necessity issues. To address this problem, we first derive MRs from a fundamental property of linear classification algorithms, i.e., algorithm stability, with the soundness guarantee. We then formulate such MRs in a way that is rare in usage but could be more effective according to our field study and analysis, i.e., Past-execution Dependent MR (PD-MR), as contrast to the traditional way, i.e., Past-execution Independent MR (PI-MR), which has been extensively studied. We experimentally evaluated our new MRs upon nine well-known linear classification algorithms. The results reported that the new MRs detected 37.6–329.2% more bugs than existing benchmark MRs.

中文翻译：

通过算法稳定性对线性分类程序进行有效的变形测试

如今，机器学习系统的质量保证变得越来越重要。虽然已经为这些系统的训练模型付出了很多努力，但我们关注这些系统本身的质量，因为后者本质上决定了如此训练的众多模型的质量。在本文中，我们特别关注在实现一类模型训练系统时检测错误，即线性分类算法，众所周知，由于缺乏测试预言机而具有挑战性。现有工作已尝试使用变形测试来缓解预言机问题，但未能忽视此类学习算法的统计性质，导致过早的变形关系 (MR) 遭受有效性和必要性问题。为了解决这个问题，我们首先从线性分类算法的一个基本属性中推导出 MR，即算法稳定性，以及稳健性保证。然后，我们以一种在使用中很少见但根据我们的实地研究和分析可能更有效的方式来制定此类 MR，即过去执行依赖 MR (PD-MR)，与传统方式相比，即过去-执行独立 MR (PI-MR)，已被广泛研究。我们根据九种著名的线性分类算法对我们的新 MR 进行了实验评估。结果表明，新 MR 检测到的错误比现有基准 MR 多 37.6-329.2%。过去执行依赖 MR (PD-MR)，与传统方式形成对比，即过去执行独立 MR (PI-MR)，已被广泛研究。我们根据九种著名的线性分类算法对我们的新 MR 进行了实验评估。结果表明，新 MR 检测到的错误比现有基准 MR 多 37.6-329.2%。过去执行依赖 MR (PD-MR)，与传统方式形成对比，即过去执行独立 MR (PI-MR)，已被广泛研究。我们根据九种著名的线性分类算法对我们的新 MR 进行了实验评估。结果表明，新 MR 检测到的错误比现有基准 MR 多 37.6-329.2%。

更新日期：2021-06-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11