当前位置: X-MOL 学术J. Circuits Syst. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
High Performance Kernel Architecture for Convolutional Neural Network Acceleration
Journal of Circuits, Systems and Computers ( IF 0.9 ) Pub Date : 2021-05-25 , DOI: 10.1142/s0218126621502662
Anakhi Hazarika 1 , Soumyajit Poddar 1 , Hafizur Rahaman 2
Affiliation  

Convolutional neural networks (CNNs) have emerged as a prominent choice in artificial intelligence tasks. Recent advancements in CNN designs have greatly improved the performance and energy-efficiency of several computation-intensive applications. However, in real-time applications, greater accuracy of CNN is attained at the expense of very high computational cost and complexity. Further, the implementation of real-time CNN on embedded platforms is highly challenging due to resource and power constraints. This paper addresses the aforesaid computational complexity and presents an accelerator architecture accompanied by a novel kernel design to improve overall CNN performance. The proposed kernel design introduces a computing mechanism that reduces the data movement cost in terms of computational cycle count (latency) by parallelizing the convolution processing elements. This architecture takes advantage of the overlap of spatially adjacent data. The performance of the proposed architecture is also analyzed for multiple hyper-parameter configurations. The proposed accelerator achieves an average of 16× improvement in reduction of execution time than the conventional computing architecture. To analyze the proposed architecture’s performance, we validate the architecture with AlexNet and VGG-16 CNN models. The proposed accelerator architecture achieves an average of 1.7× throughput improvement over state-of-the-art accelerators.

中文翻译:

卷积神经网络加速的高性能内核架构

卷积神经网络 (CNN) 已成为人工智能任务中的重要选择。CNN 设计的最新进展极大地提高了一些计算密集型应用程序的性能和能源效率。然而,在实时应用中,CNN 的更高准确性是以非常高的计算成本和复杂性为代价的。此外,由于资源和功率限制,在嵌入式平台上实现实时 CNN 极具挑战性。本文解决了上述计算复杂性,并提出了一种加速器架构以及一种新颖的内核设计,以提高整体 CNN 性能。所提出的内核设计引入了一种计算机制,该机制通过并行化卷积处理元素来降低计算周期计数(延迟)方面的数据移动成本。这种架构利用了空间相邻数据的重叠。还针对多个超参数配置分析了所提出架构的性能。所提出的加速器平均达到16×与传统的计算架构相比,在减少执行时间方面有所改进。为了分析所提出架构的性能,我们使用 AlexNet 和 VGG-16 CNN 模型验证了该架构。所提出的加速器架构平均达到1.7×与最先进的加速器相比,吞吐量有所提高。
更新日期:2021-05-25
down
wechat
bug