当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
AccSS3D: Accelerator for Spatially Sparse 3D DNNs
arXiv - CS - Hardware Architecture Pub Date : 2020-11-25 , DOI: arxiv-2011.12669
Om Ji Omer, Prashant Laddha, Gurpreet S Kalsi, Anirud Thyagharajan, Kamlesh R Pillai, Abhimanyu Kulkarni, Anbang Yao, Yurong Chen, Sreenivas Subramoney

Semantic understanding and completion of real world scenes is a foundational primitive of 3D Visual perception widely used in high-level applications such as robotics, medical imaging, autonomous driving and navigation. Due to the curse of dimensionality, compute and memory requirements for 3D scene understanding grow in cubic complexity with voxel resolution, posing a huge impediment to realizing real-time energy efficient deployments. The inherent spatial sparsity present in the 3D world due to free space is fundamentally different from the channel-wise sparsity that has been extensively studied. We present ACCELERATOR FOR SPATIALLY SPARSE 3D DNNs (AccSS3D), the first end-to-end solution for accelerating 3D scene understanding by exploiting the ample spatial sparsity. As an algorithm-dataflow-architecture co-designed system specialized for spatially-sparse 3D scene understanding, AccSS3D includes novel spatial locality-aware metadata structures, a near-zero latency and spatial sparsity-aware dataflow optimizer, a surface orientation aware pointcloud reordering algorithm and a codesigned hardware accelerator for spatial sparsity that exploits data reuse through systolic and multicast interconnects. The SSpNNA accelerator core together with the 64 KB of L1 memory requires 0.92 mm2 of area in 16nm process at 1 GHz. Overall, AccSS3D achieves 16.8x speedup and a 2232x energy efficiency improvement for 3D sparse convolution compared to an Intel-i7-8700K 4-core CPU, which translates to a 11.8x end-to-end 3D semantic segmentation speedup and a 24.8x energy efficiency improvement (iso technology node)

中文翻译:

AccSS3D:空间稀疏3D DNN的加速器

语义理解和完成真实世界的场景是3D视觉感知的基础知识,已广泛用于高级应用程序中,例如机器人技术,医学成像,自动驾驶和导航。由于维数的诅咒,用于3D场景理解的计算和内存需求随着体素分辨率的增加而变得立方复杂,这对实现实时节能部署构成了极大的障碍。由于自由空间,3D世界中存在的固有空间稀疏性与已广泛研究的通道方式稀疏性根本不同。我们提出了用于空间稀疏3D DNN的加速器(AccSS3D),这是通过充分利用空间稀疏性来加速3D场景理解的首个端到端解决方案。作为专门用于空间稀疏3D场景理解的算法-数据流-体系结构共同设计的系统,AccSS3D包括新颖的空间局部性感知元数据结构,近零延迟和空间稀疏性感知数据流优化器,表面定向感知点云重新排序算法以及用于空间稀疏性的带代码签名的硬件加速器,它通过收缩和多播互连来利用数据重用。SSpNNA加速器内核与64 KB L1存储器一起在1 GHz的16nm工艺中需要0.92 mm2的面积。总体而言,与Intel-i7-8700K 4核CPU相比,AccSS3D的3D稀疏卷积实现了16.8倍的加速,并提高了2232倍的能源效率,这转化为11.8倍的端到端3D语义分段加速和24.8倍的能耗。效率提高(ISO技术节点)AccSS3D包括新颖的空间位置感知元数据结构,近零延迟和空间稀疏感知数据流优化器,表面感知点云重新排序算法以及用于空间稀疏的代码签名硬件加速器,可通过收缩和多播互连利用数据复用。SSpNNA加速器内核与64 KB L1存储器一起在1 GHz的16nm工艺中需要0.92 mm2的面积。总体而言,与Intel-i7-8700K 4核CPU相比,AccSS3D的3D稀疏卷积实现了16.8倍的加速,并提高了2232倍的能源效率,这转化为11.8倍的端到端3D语义分段加速和24.8倍的能耗。效率提高(ISO技术节点)AccSS3D包括新颖的空间位置感知元数据结构,近零延迟和空间稀疏感知数据流优化器,表面感知点云重新排序算法以及用于空间稀疏的代码签名硬件加速器,可通过收缩和多播互连利用数据复用。SSpNNA加速器内核与64 KB L1存储器一起在1 GHz的16nm工艺中需要0.92 mm2的面积。总体而言,与Intel-i7-8700K 4核CPU相比,AccSS3D的3D稀疏卷积实现了16.8倍的加速,并提高了2232倍的能源效率,这转化为11.8倍的端到端3D语义分段加速和24.8倍的能耗。效率提高(ISO技术节点)面向表面的点云重排序算法和用于空间稀疏性的带代码签名的硬件加速器,可通过收缩和多播互连来利用数据重用。SSpNNA加速器内核与64 KB L1存储器一起在1 GHz的16nm工艺中需要0.92 mm2的面积。总体而言,与Intel-i7-8700K 4核CPU相比,AccSS3D的3D稀疏卷积实现了16.8倍的加速,并提高了2232倍的能源效率,这转化为11.8倍的端到端3D语义分段加速和24.8倍的能耗。效率提高(ISO技术节点)面向表面的点云重排序算法和用于空间稀疏性的带代码签名的硬件加速器,可通过收缩和多播互连来利用数据重用。SSpNNA加速器内核与64 KB L1存储器一起在1 GHz的16nm工艺中需要0.92 mm2的面积。总体而言,与Intel-i7-8700K 4核CPU相比,AccSS3D的3D稀疏卷积实现了16.8倍的加速,并提高了2232倍的能源效率,这转化为11.8倍的端到端3D语义分段加速和24.8倍的能耗。效率提高(ISO技术节点)在1 GHz下采用16nm工艺的面积为92 mm2。总体而言,与Intel-i7-8700K 4核CPU相比,AccSS3D的3D稀疏卷积实现了16.8倍的加速,并提高了2232倍的能源效率,这转化为11.8倍的端到端3D语义分段加速和24.8倍的能耗。效率提高(ISO技术节点)在1 GHz下采用16nm工艺的面积为92 mm2。总体而言,与Intel-i7-8700K 4核CPU相比,AccSS3D的3D稀疏卷积实现了16.8倍的加速,并提高了2232倍的能源效率,这转化为11.8倍的端到端3D语义分段加速和24.8倍的能耗。效率提高(ISO技术节点)
更新日期:2020-11-27
down
wechat
bug