Performance Evaluation of Intel Optane Memory for Managed Workloads,ACM Transactions on Architecture and Code Optimization

当前位置： X-MOL 学术 › ACM Trans. Archit. Code Optim. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Performance Evaluation of Intel Optane Memory for Managed Workloads
ACM Transactions on Architecture and Code Optimization ( IF 1.5 ) Pub Date : 2021-04-22 , DOI: 10.1145/3451342
Shoaib Akram ₁

Affiliation

Intel Optane memory offers non-volatility, byte addressability, and high capacity. It suits managed workloads that prefer large main memory heaps. We investigate Optane as the main memory for managed (Java) workloads, focusing on performance scalability. As the workload (core count) increases, we note Optane’s performance relative to DRAM. A few workloads incur a slight slowdown on Optane memory, which helps conserve limited DRAM capacity. Unfortunately, other workloads scale poorly beyond a few core counts. This article investigates scaling bottlenecks for Java workloads on Optane memory, analyzing the application, runtime, and microarchitectural interactions. Poorly scaling workloads allocate objects rapidly and access objects in Optane memory frequently. These characteristics slow down the mutator and substantially slow down garbage collection (GC). At the microarchitecture level, load, store, and instruction miss penalties rise. To regain performance, we partition heaps across DRAM and Optane memory, a hybrid that scales considerably better than Optane alone. We exploit state-of-the-art GC approaches to partition heaps. Unfortunately, existing GC approaches needlessly waste DRAM capacity because they ignore runtime behavior. This article also introduces performance impact-guided memory allocation (PIMA) for hybrid memories. PIMA maximizes Optane utilization, allocating in DRAM only if it improves performance. It estimates the performance impact of allocating heaps in either memory type by sampling. We target PIMA at graph analytics workloads, offering a novel performance estimation method and detailed evaluation. PIMA identifies workload phases that benefit from DRAM with high (94.33%) accuracy, incurring only a 2% sampling overhead. PIMA operates stand-alone or combines with prior approaches to offer new performance versus DRAM capacity trade-offs. This work opens up Optane memory to a real-life role as the main memory for Java workloads.

中文翻译：

用于托管工作负载的英特尔傲腾内存的性能评估

英特尔傲腾内存提供非易失性、字节寻址能力和高容量。它适合喜欢大型主内存堆的托管工作负载。我们将 Optane 用作托管 (Java) 工作负载的主内存，重点关注性能可扩展性。随着工作负载（核心数量）的增加，我们注意到 Optane 相对于 DRAM 的性能。一些工作负载会导致 Optane 内存略微放缓，这有助于节省有限的 DRAM 容量。不幸的是，其他工作负载的扩展性很差，超出了几个核心数量。本文研究了 Optane 内存上 Java 工作负载的扩展瓶颈，分析了应用程序、运行时和微架构交互。扩展性差的工作负载会快速分配对象并频繁访问 Optane 内存中的对象。这些特性减慢了 mutator 并大大减慢了垃圾收集 (GC)。在微架构级别，加载、存储和指令丢失惩罚会增加。为了恢复性能，我们在 DRAM 和 Optane 内存之间对堆进行分区，这是一种比单独使用 Optane 可扩展得多的混合体。我们利用最先进的 GC 方法对堆进行分区。不幸的是，现有的 GC 方法不必要地浪费了 DRAM 容量，因为它们忽略了运行时行为。本文还介绍了混合内存的性能影响引导内存分配 (PIMA)。PIMA 最大限度地提高了 Optane 的利用率，只有在提高性能时才在 DRAM 中分配。它通过采样估计在任一内存类型中分配堆的性能影响。我们将 PIMA 定位于图形分析工作负载，提供一种新颖的性能评估方法和详细的评估。PIMA 以高 (94.33%) 的准确度识别受益于 DRAM 的工作负载阶段，仅产生 2% 的采样开销。PIMA 独立运行或与以前的方法相结合，以提供新的性能与 DRAM 容量的权衡。这项工作使 Optane 内存成为现实生活中作为 Java 工作负载的主内存的角色。

更新日期：2021-04-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11