Automating Test Case Identification in Open Source Projects on GitHub,arXiv - CS - Software Engineering

当前位置： X-MOL 学术 › arXiv.cs.SE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automating Test Case Identification in Open Source Projects on GitHub
arXiv - CS - Software Engineering Pub Date : 2021-02-23 , DOI: arxiv-2102.11678
Matej Madeja, Jaroslav Porubän, Michaela Bačíková, Matúš Sulír, Ján Juhár, Sergej Chodarev, Filip Gurbáľ

Software testing is one of the very important Quality Assurance (QA) components. A lot of researchers deal with the testing process in terms of tester motivation and how tests should or should not be written. However, it is not known from the recommendations how the tests are actually written in real projects. In this paper the following was investigated: (i) the denotation of the test word in different natural languages; (ii) whether the test word correlates with the presence of test cases; and (iii) what testing frameworks are mostly used. The analysis was performed on 38 GitHub open source repositories thoroughly selected from the set of 4.3M GitHub projects. We analyzed 20,340 test cases in 803 classes manually and 170k classes using an automated approach. The results show that: (i) there exists weak correlation (r = 0.655) between the word test and test cases presence in a class; (ii) the proposed algorithm using static file analysis correctly detected 95\% of test cases; (iii) 15\% of the analyzed classes used main() function whose represent regular Java programs that test the production code without using any third-party framework. The identification of such tests is very low due to implementation diversity. The results may be leveraged to more quickly identify and locate test cases in a repository, to understand practices in customized testing solutions and to mine tests to improve program comprehension in the future.

中文翻译：

在GitHub上的开源项目中自动化测试用例识别

软件测试是非常重要的质量保证（QA）组件之一。许多研究人员从测试人员的动机以及如何编写或不应该编写测试的角度来处理测试过程。但是，从建议中并不清楚如何在实际项目中实际编写测试。本文对以下内容进行了研究：（i）不同自然语言中测试单词的含义；（ii）测试词是否与测试用例的存在相关；（iii）主要使用哪些测试框架。该分析是在从430万个GitHub项目集中选择的38个GitHub开源存储库中进行的。我们手动分析了803个类别中的20,340个测试用例，并使用自动化方法分析了170k个类别中的用例。结果表明：（i）存在弱相关性（r = 0。655）在单词test和测试用例之间存在一个类; （ii）提出的使用静态文件分析的算法正确地检测了95％的测试用例；（iii）15％的分析类使用main（）函数，这些函数表示可在不使用任何第三方框架的情况下测试生产代码的常规Java程序。由于实现的多样性，此类测试的识别率非常低。可以利用结果来更快地识别和定位存储库中的测试用例，了解定制测试解决方案中的实践，并挖掘测试以提高将来的程序理解能力。（iii）15％的分析类使用main（）函数，这些函数表示可在不使用任何第三方框架的情况下测试生产代码的常规Java程序。由于实现的多样性，此类测试的识别率非常低。可以利用结果来更快地识别和定位存储库中的测试用例，了解定制测试解决方案中的实践，并挖掘测试以提高将来的程序理解能力。（iii）15％的分析类使用main（）函数，这些函数表示可在不使用任何第三方框架的情况下测试生产代码的常规Java程序。由于实现的多样性，此类测试的识别率非常低。可以利用结果来更快地识别和定位存储库中的测试用例，了解定制测试解决方案中的实践，并挖掘测试以提高将来的程序理解能力。

更新日期：2021-02-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文