当前位置: X-MOL 学术J. Sign. Process. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Efficient FIFO Based Accelerator for Convolutional Neural Networks
Journal of Signal Processing Systems ( IF 1.8 ) Pub Date : 2021-02-20 , DOI: 10.1007/s11265-020-01632-0
Vineet Panchbhaiyye , Tokunbo Ogunfunmi

Over the last decade, Convolutional Neural Networks (CNNs) have become the go to technique to perform tasks in deep learning applications such as computer vision, speech recognition, etc. LeCun et al (Nature 521(7553), 436–44) 2015. Even though CNNs are very efficient at these tasks they are not suitable for embedded applications due to the limited power budget. In this work we present an improved architecture to process the convolution layers in a CNN. This work is based on our earlier architecture which uses FIFO (First In First Out memory)s to accelerate CNNs. Panchbhaiyye and Ogunfunmi 2020. The architecture presented takes advantage of sparsity in CNN layer’s inputs and outputs to achieve performance improvement. We evaluate the proposed improvement on 16 bit floating point and 8 bit integer data types and find that this leads to more than 13% improvement in the processing time of convolution layers for VGG16 with float16 data type. Also, we show how this architecture can be used to compute fully connected layers. Overall we are able to exceed the performance of state-of-the-art architectures by more than 1.65x using an inexpensive Pynq Z1 board running at 100Mhz.



中文翻译:

卷积神经网络的基于FIFO的高效加速器

在过去的十年中,卷积神经网络(CNN)已成为在深度学习应用程序(例如计算机视觉,语音识别等)中执行任务的技术。LeCun等(自然521(7553),436-44)2015。尽管CNN在这些任务上非常高效,但由于功率预算有限,它们不适合嵌入式应用。在这项工作中,我们提出了一种改进的体系结构来处理CNN中的卷积层。这项工作基于我们之前的体系结构,该体系结构使用FIFO(先进先出内存)来加速CNN。Panchbhaiyye和Ogunfunmi 2020年。提出的体系结构利用了CNN层输入和输出的稀疏性来提高性能。我们评估了对16位浮点和8位整数数据类型的改进建议,发现对于带float16数据类型的VGG16,卷积层的处理时间缩短了13%以上。此外,我们还将展示如何使用此架构来计算完全连接的层。

更新日期:2021-02-21
down
wechat
bug