ADC-PIM: Accelerating Convolution on the GPU via In-Memory Approximate Data Comparison,IEEE Journal on Emerging and Selected Topics in Circuits and Systems

当前位置： X-MOL 学术 › IEEE J. Emerg. Sel. Top. Circuits Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

ADC-PIM: Accelerating Convolution on the GPU via In-Memory Approximate Data Comparison
IEEE Journal on Emerging and Selected Topics in Circuits and Systems ( IF 3.7 ) Pub Date : 4-14-2022 , DOI: 10.1109/jetcas.2022.3167391
Jungwoo Choi ₁ , Hyuk-Jae Lee ₁ , Chae Eun Rhee ₂

Affiliation

Recently, convolutional neural networks (CNN) have been widely used in image processing and computer vision. GPUs are often used to accelerate the CNN, but performance is limited by high computational costs and memory usage of the convolution. Prior studies exploited approximate computing to reduce the computational costs. However, they only reduced the amount of the computation, thereby its performance is bottlenecked by the memory bandwidth due to an increased memory intensity. In addition, load imbalance between warps caused by approximation also inhibits the performance improvement. In this paper, we propose a processing-in-memory (PIM) solution that reduces the amount of data movement and computation through the Approximate Data Comparison (ADC-PIM). Instead of determining the value similarity after loading the data to the GPU, the ADC-PIM unit located on 3D-stacked memory compares the similarity and transfers only the selected representative data to the GPU. The GPU performs convolution on the representative data transferred from the ADC-PIM, and reuses the calculated results based on the similarity information. To reduce the increase in memory latency caused by the in-memory data comparison, we propose a two-level PIM architecture that exploits both the DRAM bank and TSV stage. By dividing the comparisons into multiple banks and then merging the results on the TSV stage, the ADC-PIM effectively hides the delay caused by the comparisons. To ease the load balancing on the GPU, the ADC-PIM performs data reorganization by assigning the representative data to addresses that are computed based on the comparison result. Experimental results show that the proposed ADC-PIM provides a 43% speedup and 32% energy saving with less than a 1% accuracy drop.

中文翻译：

ADC-PIM：通过内存中近似数据比较加速 GPU 上的卷积

近年来，卷积神经网络（CNN）已广泛应用于图像处理和计算机视觉领域。 GPU 通常用于加速 CNN，但性能受到高计算成本和卷积内存使用的限制。先前的研究利用近似计算来降低计算成本。然而，它们只是减少了计算量，因此由于内存强度的增加，其性能受到内存带宽的瓶颈。此外，近似引起的warps之间的负载不平衡也抑制了性能的提高。在本文中，我们提出了一种内存处理（PIM）解决方案，通过近似数据比较（ADC-PIM）减少数据移动和计算量。位于 3D 堆栈内存上的 ADC-PIM 单元不是在将数据加载到 GPU 后确定值相似度，而是比较相似度并仅将选定的代表性数据传输到 GPU。 GPU对从ADC-PIM传输来的代表性数据进行卷积，并根据相似性信息重新使用计算结果。为了减少内存中数据比较导致的内存延迟增加，我们提出了一种利用 DRAM 存储体和 TSV 阶段的两级 PIM 架构。通过将比较分为多个组，然后在 TSV 阶段合并结果，ADC-PIM 有效隐藏了比较引起的延迟。为了减轻 GPU 上的负载平衡，ADC-PIM 通过将代表性数据分配到根据比较结果计算出的地址来执行数据重组。实验结果表明，所提出的 ADC-PIM 提供了 43% 的加速和 32% 的节能，且精度下降不到 1%。

更新日期：2024-08-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11