SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators,arXiv - CS - Hardware Architecture

当前位置： X-MOL 学术 › arXiv.cs.AR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators
arXiv - CS - Hardware Architecture Pub Date : 2020-01-20 , DOI: arxiv-2001.07045
Javier Picorel, Seyed Alireza Sanaee Kohroudi, Zi Yan, Abhishek Bhattacharjee, Babak Falsafi, Djordje Jevdjic

Virtual memory (VM) is critical to the usability and programmability of hardware accelerators. Unfortunately, implementing accelerator VM efficiently is challenging because the area and power constraints make it difficult to employ the large multi-level TLBs used in general-purpose CPUs. Recent research proposals advocate a number of restrictions on virtual-to-physical address mappings in order to reduce the TLB size or increase its reach. However, such restrictions are unattractive because they forgo many of the original benefits of traditional VM, such as demand paging and copy-on-write. We propose SPARTA, a divide and conquer approach to address translation. SPARTA splits the address translation into accelerator-side and memory-side parts. The accelerator-side translation hardware consists of a tiny TLB covering only the accelerator's cache hierarchy (if any), while the translation for main memory accesses is performed by shared memory-side TLBs. Performing the translation for memory accesses on the memory side allows SPARTA to overlap data fetch with translation, and avoids the replication of TLB entries for data shared among accelerators. To further improve the performance and efficiency of the memory-side translation, SPARTA logically partitions the memory space, delegating translation to small and efficient per-partition translation hardware. Our evaluation on index-traversal accelerators shows that SPARTA virtually eliminates translation overhead, reducing it by over 30x on average (up to 47x) and improving performance by 57%. At the same time, SPARTA requires minimal accelerator-side translation hardware, reduces the total number of TLB entries in the system, gracefully scales with memory size, and preserves all key VM functionalities.

中文翻译：

SPARTA：加速器地址转换的分而治之的方法

虚拟内存 (VM) 对硬件加速器的可用性和可编程性至关重要。不幸的是，有效地实现加速器 VM 具有挑战性，因为面积和功率限制使得难以使用通用 CPU 中使用的大型多级 TLB。最近的研究提案提倡对虚拟到物理地址映射进行一些限制，以减少 TLB 大小或增加其范围。然而，这些限制并不吸引人，因为它们放弃了传统 VM 的许多原始优势，例如需求分页和写时复制。我们提出了 SPARTA，一种用于地址翻译的分而治之的方法。SPARTA 将地址转换分为加速器端和内存端部分。加速器端的翻译硬件由一个仅覆盖加速器的微小 TLB 组成 s 缓存层次结构（如果有），而主内存访问的转换由共享内存端 TLB 执行。在内存端执行内存访问转换允许 SPARTA 将数据获取与转换重叠，并避免复制 TLB 条目以在加速器之间共享数据。为了进一步提高内存端翻译的性能和效率，SPARTA 对内存空间进行了逻辑分区，将翻译委托给小而高效的按分区翻译硬件。我们对索引遍历加速器的评估表明，SPARTA 几乎消除了翻译开销，平均减少了 30 倍以上（高达 47 倍），并将性能提高了 57%。同时，SPARTA 需要最少的加速器端翻译硬件，减少了系统中 TLB 条目的总数，

更新日期：2020-01-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文