当前位置: X-MOL 学术PeerJ Comput. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Using application benchmark call graphs to quantify and improve the practical relevance of microbenchmark suites
PeerJ Computer Science ( IF 3.8 ) Pub Date : 2021-05-28 , DOI: 10.7717/peerj-cs.548
Martin Grambow 1 , Christoph Laaber 2 , Philipp Leitner 3 , David Bermbach 1
Affiliation  

Performance problems in applications should ideally be detected as soon as they occur, i.e., directly when the causing code modification is added to the code repository. To this end, complex and cost-intensive application benchmarks or lightweight but less relevant microbenchmarks can be added to existing build pipelines to ensure performance goals. In this paper, we show how the practical relevance of microbenchmark suites can be improved and verified based on the application flow during an application benchmark run. We propose an approach to determine the overlap of common function calls between application and microbenchmarks, describe a method which identifies redundant microbenchmarks, and present a recommendation algorithm which reveals relevant functions that are not covered by microbenchmarks yet. A microbenchmark suite optimized in this way can easily test all functions determined to be relevant by application benchmarks after every code change, thus, significantly reducing the risk of undetected performance problems. Our evaluation using two time series databases shows that, depending on the specific application scenario, application benchmarks cover different functions of the system under test. Their respective microbenchmark suites cover between 35.62% and 66.29% of the functions called during the application benchmark, offering substantial room for improvement. Through two use cases—removing redundancies in the microbenchmark suite and recommendation of yet uncovered functions—we decrease the total number of microbenchmarks and increase the practical relevance of both suites. Removing redundancies can significantly reduce the number of microbenchmarks (and thus the execution time as well) to ~10% and ~23% of the original microbenchmark suites, whereas recommendation identifies up to 26 and 14 newly, uncovered functions to benchmark to improve the relevance. By utilizing the differences and synergies of application benchmarks and microbenchmarks, our approach potentially enables effective software performance assurance with performance tests of multiple granularities.

中文翻译:

使用应用程序基准调用图来量化和改进微基准套件的实际相关性

理想情况下,应用程序中的性能问题应该在发生时立即检测,即在导致代码修改被添加到代码存储库时直接检测。为此,可以将复杂且成本密集的应用程序基准测试或轻量级但相关性较低的微基准测试添加到现有构建管道中,以确保实现性能目标。在本文中,我们展示了如何在应用程序基准测试期间基于应用程序流程改进和验证微基准测试套件的实际相关性。我们提出了一种确定应用程序和微基准之间常见函数调用重叠的方法,描述了一种识别冗余微基准的方法,并提出了一种推荐算法,该算法揭示了微基准尚未涵盖的相关功能。以这种方式优化的微基准测试套件可以在每次代码更改后轻松测试应用程序基准测试确定的所有相关功能,从而显着降低未检测到性能问题的风险。我们使用两个时间序列数据库的评估表明,根据具体的应用场景,应用基准涵盖了被测系统的不同功能。它们各自的微基准套件覆盖了应用程序基准测试期间调用的函数的 35.62% 到 66.29%,提供了很大的改进空间。通过两个用例——删除微基准测试套件中的冗余和推荐尚未发现的功能——我们减少了微基准测试的总数并增加了两个套件的实际相关性。删除冗余可以显着减少微基准测试的数量(以及执行时间)到原始微基准测试套件的约 10% 和约 23%,而推荐识别多达 26 和 14 个新的、未发现的功能以进行基准测试以提高相关性. 通过利用应用程序基准测试和微基准测试的差异和协同作用,我们的方法可能通过多粒度的性能测试实现有效的软件性能保证。
更新日期:2021-05-28
down
wechat
bug