当前位置: X-MOL 学术IEEE Trans. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MViD: Sparse Matrix-Vector Multiplication in Mobile DRAM for Accelerating Recurrent Neural Networks
IEEE Transactions on Computers ( IF 3.6 ) Pub Date : 2020-04-02 , DOI: 10.1109/tc.2020.2984496
Byeongho Kim , Jongwook Chung , Eojin Lee , Wonkyung Jung , Sunjung Lee , Jaewan Choi , Jaehyun Park , Minbok Wi , Sukhan Lee , Jung Ho Ahn

Recurrent Neural Networks (RNNs) spend most of their execution time performing matrix-vector multiplication (MV-mul). Because the matrices in RNNs have poor reusability and the ever-increasing size of the matrices becomes too large to fit in the on-chip storage of mobile/IoT devices, the performance and energy efficiency of MV-mul is determined by those of main-memory DRAM. Therefore, computing MV-mul within DRAM draws much attention. However, previous studies lacked consideration for the matrix sparsity, the power constraints of DRAM devices, and concurrency in accessing DRAM from processors while performing MV-mul. We propose a main-memory architecture called MViD, which performs MV-mul by placing MAC units inside DRAM banks. For higher computational efficiency, we use a sparse matrix format and exploit quantization. Because of the limited power budget for DRAM devices, we implement the MAC units only on a portion of the DRAM banks. We architect MViD to slow down or pause MV-mul for concurrently processing memory requests from processors while satisfying the limited power budget. Our results show that MViD provides 7.2× higher throughput compared to the baseline system with four DRAM ranks (performing MV-mul in a chip-multiprocessor) while running inference of Deep Speech 2 with a memory-intensive workload.

中文翻译:


MViD:移动 DRAM 中的稀疏矩阵向量乘法,用于加速循环神经网络



循环神经网络 (RNN) 的大部分执行时间都花在执行矩阵向量乘法 (MV-mul) 上。由于RNN中的矩阵可重用性较差,而且矩阵尺寸不断增大,导致移动/物联网设备的片上存储无法容纳,因此MV-mul的性能和能效取决于主要的性能和能效。内存 DRAM。因此,在DRAM中计算MV-mul备受关注。然而,之前的研究缺乏对矩阵稀疏性、DRAM 设备的功耗限制以及执行 MV-mul 时从处理器访问 DRAM 的并发性的考虑。我们提出了一种称为 MViD 的主存架构,它通过将 MAC 单元放置在 DRAM 组内来执行 MV-mul。为了提高计算效率,我们使用稀疏矩阵格式并利用量化。由于 DRAM 设备的功率预算有限,我们仅在部分 DRAM 组上实现 MAC 单元。我们构建 MViD 来减慢或暂停 MV-mul,以便同时处理来自处理器的内存请求,同时满足有限的功率预算。我们的结果表明,与具有四个 DRAM 列的基准系统(在芯片多处理器中执行 MV-mul)相比,在使用内存密集型工作负载运行 Deep Speech 2 推理时,MViD 的吞吐量提高了 7.2 倍。
更新日期:2020-04-02
down
wechat
bug