当前位置:
X-MOL 学术
›
arXiv.cs.AR
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
A Unified Hardware Architecture for Convolutions and Deconvolutions in CNN
arXiv - CS - Hardware Architecture Pub Date : 2020-05-29 , DOI: arxiv-2006.00053 Lin Bai, Yecheng Lyu and Xinming Huang
arXiv - CS - Hardware Architecture Pub Date : 2020-05-29 , DOI: arxiv-2006.00053 Lin Bai, Yecheng Lyu and Xinming Huang
In this paper, a scalable neural network hardware architecture for image
segmentation is proposed. By sharing the same computing resources, both
convolution and deconvolution operations are handled by the same process
element array. In addition, access to on-chip and off-chip memories is
optimized to alleviate the burden introduced by partial sum. As an example,
SegNet-Basic has been implemented using the proposed unified architecture by
targeting on Xilinx ZC706 FPGA, which achieves the performance of 151.5 GOPS
and 94.3 GOPS for convolution and deconvolution respectively. This unified
convolution/deconvolution design is applicable to other CNNs with
deconvolution.
中文翻译:
CNN 中卷积和反卷积的统一硬件架构
在本文中,提出了一种用于图像分割的可扩展神经网络硬件架构。通过共享相同的计算资源,卷积和反卷积操作都由相同的进程元素阵列处理。此外,对片上和片外存储器的访问进行了优化,以减轻部分求和带来的负担。例如,SegNet-Basic 已通过针对 Xilinx ZC706 FPGA 使用所提出的统一架构实现,其卷积和解卷积的性能分别达到 151.5 GOPS 和 94.3 GOPS。这种统一的卷积/反卷积设计适用于其他具有反卷积的 CNN。
更新日期:2020-06-02
中文翻译:
CNN 中卷积和反卷积的统一硬件架构
在本文中,提出了一种用于图像分割的可扩展神经网络硬件架构。通过共享相同的计算资源,卷积和反卷积操作都由相同的进程元素阵列处理。此外,对片上和片外存储器的访问进行了优化,以减轻部分求和带来的负担。例如,SegNet-Basic 已通过针对 Xilinx ZC706 FPGA 使用所提出的统一架构实现,其卷积和解卷积的性能分别达到 151.5 GOPS 和 94.3 GOPS。这种统一的卷积/反卷积设计适用于其他具有反卷积的 CNN。