当前位置: X-MOL 学术arXiv.cs.PF › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
nanoBench: A Low-Overhead Tool for Running Microbenchmarks on x86 Systems
arXiv - CS - Performance Pub Date : 2019-11-08 , DOI: arxiv-1911.03282
Andreas Abel and Jan Reineke

We present nanoBench, a tool for evaluating small microbenchmarks using hardware performance counters on Intel and AMD x86 systems. Most existing tools and libraries are intended to either benchmark entire programs, or program segments in the context of their execution within a larger program. In contrast, nanoBench is specifically designed to evaluate small, isolated pieces of code. Such code is common in microbenchmark-based hardware analysis techniques. Unlike previous tools, nanoBench can execute microbenchmarks directly in kernel space. This allows to benchmark privileged instructions, and it enables more accurate measurements. The reading of the performance counters is implemented with minimal overhead avoiding functions calls and branches. As a consequence, nanoBench is precise enough to measure individual memory accesses. We illustrate the utility of nanoBench at the hand of two case studies. First, we briefly discuss how nanoBench has been used to determine the latency, throughput, and port usage of more than 13,000 instruction variants on recent x86 processors. Second, we show how to generate microbenchmarks to precisely characterize the cache architectures of eleven Intel Core microarchitectures. This includes the most comprehensive analysis of the employed cache replacement policies to date.

中文翻译:

nanoBench:用于在 x86 系统上运行微基准测试的低开销工具

我们展示了 nanoBench,这是一种在 Intel 和 AMD x86 系统上使用硬件性能计数器评估小型微基准的工具。大多数现有工具和库旨在对整个程序或在更大程序中执行上下文中的程序段进行基准测试。相比之下,nanoBench 专门设计用于评估小的、孤立的代码段。这种代码在基于微基准的硬件分析技术中很常见。与以前的工具不同,nanoBench 可以直接在内核空间中执行微基准测试。这允许对特权指令进行基准测试,并且可以实现更准确的测量。性能计数器的读取是以最小的开销实现的,避免了函数调用和分支。因此,nanoBench 足够精确,可以测量单个内存访问。我们通过两个案例研究来说明 nanoBench 的实用性。首先,我们简要讨论了如何使用 nanoBench 来确定最新 x86 处理器上 13,000 多个指令变体的延迟、吞吐量和端口使用情况。其次,我们展示了如何生成微基准测试以精确表征 11 种英特尔酷睿微架构的缓存架构。这包括迄今为止对采用的缓存替换策略的最全面分析。
更新日期:2020-11-04
down
wechat
bug