当前位置: X-MOL 学术IEEE Trans. Very Larg. Scale Integr. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Approximate Memory Compression
IEEE Transactions on Very Large Scale Integration (VLSI) Systems ( IF 2.8 ) Pub Date : 2020-04-01 , DOI: 10.1109/tvlsi.2020.2970041
Ashish Ranjan , Arnab Raha , Vijay Raghunathan , Anand Raghunathan

Memory subsystems are a major energy bottleneck in computing platforms due to frequent transfers between processors and off-chip memory. We propose approximate memory compression, a technique that leverages the intrinsic resilience of emerging workloads such as machine learning and data analytics to reduce off-chip memory traffic, thereby improving energy and performance. We realize approximate memory compression by enhancing the memory controller to be aware of approximate memory regions—regions in memory that contain approximation-resilient data—and to transparently compress (decompress) the data written to (read from) these regions. To provide control over approximations, each approximate memory region is associated with an error constraint such as the maximum error that may be introduced in each data element. The quality-aware memory controller subjects memory transactions to a compression scheme that introduces approximations, thereby reducing memory traffic, while adhering to the specified error constraint for each approximate memory region. A software interface is provided to allow programmers to identify data structures (DSs) that are resilient to approximations. A runtime quality control framework automatically determines the error constraints for the identified DSs such that a given target application-level quality is maintained. We evaluate our proposal by applying it to three different main memory technologies in the context of a general-purpose computing system—DDR3 DRAM, LPDDR3 DRAM, and spin-transfer torque magnetic RAM (STT-MRAM). To demonstrate the feasibility of the proposed concepts, we also implement a hardware prototype using the Intel UniPHY-DDR3 memory controller and Nios-II processor, a Hynix DDR3 DRAM module, and a Stratix-IV field-programmable gate array (FPGA) development board. Across a wide range of machine learning benchmarks, approximate memory compression obtains significant benefits in main memory energy ( $1.18\times $ for DDR3 DRAM, $1.52\times $ for LPDDR3 DRAM, and $2.0\times $ for STT-MRAM) and a simultaneous improvement in execution time (5.2% for DDR3 DRAM, 5.4% for LPDDR3 DRAM, and 9.3% for STT-MRAM) with nearly identical application output quality.

中文翻译:

近似内存压缩

由于处理器和片外存储器之间的频繁传输,存储器子系统是计算平台中的主要能源瓶颈。我们提出了近似内存压缩,这是一种利用机器学习和数据分析等新兴工作负载的内在弹性来减少片外内存流量的技术,从而提高能源和性能。我们通过增强内存控制器以了解近似内存区域(内存中包含近似弹性数据的区域)并透明地压缩(解压缩)写入(读取)这些区域的数据来实现近似内存压缩。为了提供对近似值的控制,每个近似内存区域都与一个错误约束相关联,例如每个数据元素中可能引入的最大错误。质量感知内存控制器将内存事务置于引入近似值的压缩方案中,从而减少内存流量,同时遵守每个近似内存区域的指定错误约束。提供了一个软件接口,允许程序员识别对近似值有弹性的数据结构 (DS)。运行时质量控制框架自动确定已识别 DS 的错误约束,从而保持给定的目标应用程序级质量。我们通过将其应用于通用计算系统环境中的三种不同主存储器技术来评估我们的提议——DDR3 DRAM、LPDDR3 DRAM 和自旋转移矩磁 RAM (STT-MRAM)。为了证明提议概念的可行性,我们还使用 Intel UniPHY-DDR3 内存控制器和 Nios-II 处理器、Hynix DDR3 DRAM 模块和 Stratix-IV 现场可编程门阵列 (FPGA) 开发板实现了硬件原型。在广泛的机器学习基准测试中,近似内存压缩在主内存能量方面获得了显着的好处(DDR3 DRAM 为 1.18 美元,LPDDR3 DRAM 为 1.52 美元,STT-MRAM 为 2.0 美元),并且同时得到了改进执行时间(DDR3 DRAM 为 5.2%,LPDDR3 DRAM 为 5.4%,STT-MRAM 为 9.3%),应用程序输出质量几乎相同。
更新日期:2020-04-01
down
wechat
bug