当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Channel Tiling for Improved Performance and Accuracy of Optical Neural Network Accelerators
arXiv - CS - Hardware Architecture Pub Date : 2020-11-14 , DOI: arxiv-2011.07391
Shurui Li, Mario Miscuglio, Volker J. Sorger, Puneet Gupta

Low latency, high throughput inference on Convolution Neural Networks (CNNs) remains a challenge, especially for applications requiring large input or large kernel sizes. 4F optics provides a solution to accelerate CNNs by converting convolutions into Fourier-domain point-wise multiplications that are computationally 'free' in optical domain. However, existing 4F CNN systems suffer from the all-positive sensor readout issue which makes the implementation of a multi-channel, multi-layer CNN not scalable or even impractical. In this paper we propose a simple channel tiling scheme for 4F CNN systems that utilizes the high resolution of 4F system to perform channel summation inherently in optical domain before sensor detection, so the outputs of different channels can be correctly accumulated. Compared to state of the art, channel tiling gives similar accuracy, significantly better robustness to sensing quantization (33\% improvement in required sensing precision) error and noise (10dB reduction in tolerable sensing noise), 0.5X total filters required, 10-50X+ throughput improvement and as much as 3X reduction in required output camera resolution/bandwidth. Not requiring any additional optical hardware, the proposed channel tiling approach addresses an important throughput and precision bottleneck of high-speed, massively-parallel optical 4F computing systems.

中文翻译:

用于提高光神经网络加速器性能和精度的通道平铺

卷积神经网络 (CNN) 上的低延迟、高吞吐量推理仍然是一个挑战,尤其是对于需要大输入或大内核大小的应用程序。4F 光学提供了一种通过将卷积转换为傅立叶域逐点乘法来加速 CNN 的解决方案,这些乘法在光域中是“免费”计算的。然而,现有的 4F CNN 系统存在全正传感器读数问题,这使得多通道、多层 CNN 的实现无法扩展甚至不切实际。在本文中,我们为 4F CNN 系统提出了一种简单的通道平铺方案,该方案利用 4F 系统的高分辨率在传感器检测之前在光域中进行固有的通道求和,因此可以正确累积不同通道的输出。与最先进的技术相比,通道平铺提供相似的精度,显着更好的感知量化鲁棒性(所需的感知精度提高 33%)错误和噪声(可容忍的感知噪声减少 10dB),需要 0.5X 的总滤波器,10-50X+ 吞吐量提高和多达 3X降低所需的输出相机分辨率/带宽。不需要任何额外的光学硬件,所提出的通道平铺方法解决了高速、大规模并行光学 4F 计算系统的重要吞吐量和精度瓶颈。吞吐量提高 10-50 倍以上,所需的输出相机分辨率/带宽减少多达 3 倍。不需要任何额外的光学硬件,所提出的通道平铺方法解决了高速、大规模并行光学 4F 计算系统的重要吞吐量和精度瓶颈。吞吐量提高 10-50 倍以上,所需的输出相机分辨率/带宽减少多达 3 倍。不需要任何额外的光学硬件,所提出的通道平铺方法解决了高速、大规模并行光学 4F 计算系统的重要吞吐量和精度瓶颈。
更新日期:2020-11-17
down
wechat
bug