使用Microbench Marksk套件来检测应用程序性能的变化

论文标题

使用Microbench Marksk套件来检测应用程序性能的变化

Using Microbenchmark Suites to Detect Application Performance Changes

论文作者

Grambow, Martin, Kovalev, Denis, Laaber, Christoph, Leitner, Philipp, Bermbach, David

论文摘要

软件性能的变化是昂贵的，而且通常很难检测到预释放。与软件测试框架类似，可以将应用程序基准或微型计算标准集成到质量保证管道中，以检测性能变化，然后再发布新的应用程序版本。不幸的是，广泛的基准测试研究通常需要几个小时，这在检查了数十个每日代码的详细更改时是有问题的。因此，必须进行权衡。优化的Microbench Marksk套件（仅包括完整套件的一小部分）是解决此问题的潜在解决方案，因为它们仍然可靠地检测到大多数应用程序性能变化，例如请求延迟增加。但是，尚不清楚微结构和应用程序基准是否检测到相同的性能问题，并且一个可能是对另一个的代理。在本文中，我们探讨了Microbenchmark Suites是否可以检测到与应用程序基准相同的应用程序性能变化。为此，我们使用两个时间序列数据库系统InuxDB和Victoriametrics的完整和优化的微型计算套件进行了广泛的基准实验，并将其结果与相应的应用基准的结果进行了比较。我们分别为70和110委托这样做。我们的结果表明，如果可以忍受频繁的假阳性警报，则可以使用优化的微型计算套件来检测应用程序性能更改。

Software performance changes are costly and often hard to detect pre-release. Similar to software testing frameworks, either application benchmarks or microbenchmarks can be integrated into quality assurance pipelines to detect performance changes before releasing a new application version. Unfortunately, extensive benchmarking studies usually take several hours which is problematic when examining dozens of daily code changes in detail; hence, trade-offs have to be made. Optimized microbenchmark suites, which only include a small subset of the full suite, are a potential solution for this problem, given that they still reliably detect the majority of the application performance changes such as an increased request latency. It is, however, unclear whether microbenchmarks and application benchmarks detect the same performance problems and one can be a proxy for the other. In this paper, we explore whether microbenchmark suites can detect the same application performance changes as an application benchmark. For this, we run extensive benchmark experiments with both the complete and the optimized microbenchmark suites of the two time-series database systems InuxDB and VictoriaMetrics and compare their results to the results of corresponding application benchmarks. We do this for 70 and 110 commits, respectively. Our results show that it is possible to detect application performance changes using an optimized microbenchmark suite if frequent false-positive alarms can be tolerated.

下载PDF全文

下载文献需遵守相关版权规定

论文标题