当前位置: X-MOL 学术IEEE J. Emerg. Sel. Top. Circuits Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Attention-Based Activation Pruning to Reduce Data Movement in Real-Time AI: A Case-Study on Local Motion Planning in Autonomous Vehicles
IEEE Journal on Emerging and Selected Topics in Circuits and Systems ( IF 3.7 ) Pub Date : 2020-08-12 , DOI: 10.1109/jetcas.2020.3015889
Kruttidipta Samal , Marilyn Wolf , Saibal Mukhopadhyay

In state-of-the-art deep neural network (DNN), the layer-wise activation maps leads to significant data movement in hardware accelerators operating on real-time streaming inputs. We explore an architecture-aware algorithmic approach to reduce data movement and the resulting latency and power consumption. This article presents an attention-based feedback for controlling input data, referred to as the activation pruning, that reduces activation maps in early layers of a DNN network which are critical for reducing data movement in real-time AI processing. The proposed approach is demonstrated for coupling RGB and Lidar images to perform real-time perception and local motion planning in autonomous systems. Lidar data is used to determine “Pixels of Interest”(PoI) in an RGB image depending on their distance from sensor, prune the RGB image to perform object detection only within the PoI, and use the detected objects to perform local motion planning. Experiments on sequences from KITTI dataset shows the activation pruning maintains quality of motion planning while increasing the sparsity of the activation maps. The sparsity-aware computing architectures is considered to leverage activation sparsity for improved performance. The simulation results show that proposed activation pruning algorithm reduces data movement (38.5%), computational load (30.1%), and memory latency (76.3%) in sparsity-aware compute architecture, leading to faster perception and lower energy consumption.

中文翻译:


基于注意力的激活修剪减少实时人工智能中的数据移动:自动驾驶车辆局部运动规划案例研究



在最先进的深度神经网络 (DNN) 中,逐层激活图会导致在实时流输入上运行的硬件加速器中发生显着的数据移动。我们探索一种架构感知算法方法来减少数据移动以及由此产生的延迟和功耗。本文提出了一种基于注意力的反馈,用于控制输入数据,称为激活剪枝,它可以减少 DNN 网络早期层中的激活图,这对于减少实时 AI 处理中的数据移动至关重要。所提出的方法被证明可以耦合 RGB 和激光雷达图像,以在自主系统中执行实时感知和局部运动规划。激光雷达数据用于根据与传感器的距离确定 RGB 图像中的“感兴趣像素”(PoI),修剪 RGB​​ 图像以仅在 PoI 内执行对象检测,并使用检测到的对象执行局部运动规划。对 KITTI 数据集序列的实验表明,激活剪枝保持了运动规划的质量,同时增加了激活图的稀疏性。稀疏感知计算架构被认为是利用激活稀疏性来提高性能。仿真结果表明,所提出的激活剪枝算法减少了稀疏感知计算架构中的数据移动(38.5%)、计算负载(30.1%)和内存延迟(76.3%),从而实现更快的感知和更低的能耗。
更新日期:2020-08-12
down
wechat
bug