Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture,arXiv - CS - Performance

当前位置： X-MOL 学术 › arXiv.cs.PF › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture
arXiv - CS - Performance Pub Date : 2021-05-09 , DOI: arxiv-2105.03814
Juan Gómez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, Onur Mutlu

Many modern workloads, such as neural networks, databases, and graph processing, are fundamentally memory-bound. For such workloads, the data movement between main memory and CPU cores imposes a significant overhead in terms of both latency and energy. A major reason is that this communication happens through a narrow bus with high latency and limited bandwidth, and the low data reuse in memory-bound workloads is insufficient to amortize the cost of main memory access. Fundamentally addressing this data movement bottleneck requires a paradigm where the memory system assumes an active role in computing by integrating processing capabilities. This paradigm is known as processing-in-memory (PIM). Recent research explores different forms of PIM architectures, motivated by the emergence of new 3D-stacked memory technologies that integrate memory with a logic layer where processing elements can be easily placed. Past works evaluate these architectures in simulation or, at best, with simplified hardware prototypes. In contrast, the UPMEM company has designed and manufactured the first publicly-available real-world PIM architecture. This paper provides the first comprehensive analysis of the first publicly-available real-world PIM architecture. We make two key contributions. First, we conduct an experimental characterization of the UPMEM-based PIM system using microbenchmarks to assess various architecture limits such as compute throughput and memory bandwidth, yielding new insights. Second, we present PrIM, a benchmark suite of 16 workloads from different application domains (e.g., linear algebra, databases, graph processing, neural networks, bioinformatics).

中文翻译：

对新范式进行基准测试：真实的内存中处理架构的实验分析

许多现代工作负载，例如神经网络，数据库和图形处理，基本上都是受内存限制的。对于此类工作负载，主内存和CPU内核之间的数据移动在延迟和能耗方面都带来了巨大的开销。主要原因是这种通信是通过具有高延迟和有限带宽的窄总线进行的，并且内存绑定工作负载中的低数据重用率不足以摊销主内存访问的成本。从根本上解决此数据移动瓶颈需要一种范式，其中存储系统通过集成处理能力在计算中发挥积极作用。这种范例称为内存中处理（PIM）。最近的研究探索了PIM架构的不同形式，受到新的3D堆叠内存技术的推动，这些技术将内存与逻辑层集成在一起，可以轻松放置处理元件。过去的工作在仿真中或最好在简化的硬件原型中评估了这些体系结构。相比之下，UPMEM公司设计并制造了第一个公开可用的现实世界PIM体系结构。本文提供了对第一个公开可用的实际PIM体系结构的首次全面分析。我们做出两个关键的贡献。首先，我们使用微基准对基于UPMEM的PIM系统进行实验表征，以评估各种架构限制，例如计算吞吐量和内存带宽，从而产生新的见解。其次，我们介绍PrIM，这是一个基准套件，包含来自不同应用程序域（例如线性代数，

更新日期：2021-05-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文