Enabling Highly Efficient Capsule Networks Processing Through Software-Hardware Co-Design,IEEE Transactions on Computers

当前位置： X-MOL 学术 › IEEE Trans. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Enabling Highly Efficient Capsule Networks Processing Through Software-Hardware Co-Design
IEEE Transactions on Computers ( IF 3.7 ) Pub Date : 2021-02-03 , DOI: 10.1109/tc.2021.3056929
Xingyao Zhang , Xin Fu , Donglin Zhuang , Chenhao Xie , Shuaiwen Song

As the demand for the image processing increases, the image features become increasingly complicated. Although the Convolutional Neural Network (CNN) have been widely adopted for the imaging processing tasks, it has been found easily misled due to the massive usage of pooling operations. A novel neural network structure called Capsule Networks (CapsNet) is proposed to address the CNN challenge and essentially enhance the learning ability for the image segmentation and object detection. Since the CapsNet contains the high volume of the matrix execution, it has been generally accelerated on modern GPU platforms with the highly optimized deep-learning library. However, the routing procedure of CapsNet introduces the special program and execution features,including massive unshareable intermediate variables and intensive synchronizations, causing inefficient CapsNet execution on modern GPU. To address these challenges, we propose the software-hardware co-designed optimizations, SH-CapsNet, which includes the software-level optimizations named S-CapsNet and a hybrid computing architecture design named PIM-CapsNet . In software-level, S-CapsNet reduces the computation and memory accesses by exploiting the computational redundancy and data similarity of the routing procedure. In hardware-level, the PIM-CapsNet leverages the processing-in-memory capability of today's 3D stacked memory to conduct the off-chip in-memory acceleration solution for the routing procedure, while pipelining with the GPU's on-chip computing capability for accelerating CNN types of layers in CapsNet. Evaluation results demonstrate that either our software or hardware optimizations can significantly improve the CapsNet execution efficiency. Together, our co-design can achieve greatly improvement on both performance (

$3.41\times$

) and energy savings (68.72 percent) for CapsNet inference, with negligible accuracy loss.

中文翻译：

通过软件-硬件协同设计实现高效的胶囊网络处理

随着对图像处理的需求增加，图像特征变得越来越复杂。尽管卷积神经网络（CNN）已被广泛地用于成像处理任务，但由于池操作的大量使用，已经发现它很容易被误导。提出了一种新颖的神经网络结构，称为胶囊网络（CapsNet），以解决CNN挑战，并从根本上增强图像分割和目标检测的学习能力。由于CapsNet包含大量的矩阵执行，因此在具有高度优化的深度学习库的现代GPU平台上，它通常已得到加速。但是，CapsNet的路由过程引入了特殊的程序和执行功能，包括大量不可共享的中间变量和密集同步，导致CapsNet在现代GPU上的执行效率低下。为了应对这些挑战，我们提出了软件-硬件协同设计的优化工具SH-CapsNet，其中包括名为帽网和一个混合计算架构设计，名为 PIM-CapsNet 。在软件级别，S-CapsNet通过利用路由过程的计算冗余和数据相似性来减少计算和内存访问。在硬件级别，PIM-CapsNet利用当今3D堆栈存储器的内存处理能力为路由过程提供片外内存加速解决方案，同时通过GPU的片上计算能力进行流水线化以加速CapsNet中的CNN图层类型。评估结果表明，我们的软件或硬件优化均可显着提高CapsNet的执行效率。总之，我们的协同设计可以在两种性能上实现极大的提高（

$ 3.41 \ times $

）和CapsNet推理的能源节省（68.72％），而精度损失可忽略不计。

更新日期：2021-03-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南