当前位置: X-MOL 学术Neural Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Performance estimation for the memristor-based computing-in-memory implementation of extremely factorized network for real-time and low-power semantic segmentation
Neural Networks ( IF 7.8 ) Pub Date : 2023-01-13 , DOI: 10.1016/j.neunet.2023.01.008
Shuai Dong 1 , Zhen Fan 1 , Yihong Chen 2 , Kaihui Chen 2 , Minghui Qin 2 , Min Zeng 2 , Xubing Lu 2 , Guofu Zhou 3 , Xingsen Gao 2 , Jun-Ming Liu 4
Affiliation  

Nowadays many semantic segmentation algorithms have achieved satisfactory accuracy on von Neumann platforms (e.g., GPU), but the speed and energy consumption have not meet the high requirements of certain edge applications like autonomous driving. To tackle this issue, it is of necessity to design an efficient lightweight semantic segmentation algorithm and then implement it on emerging hardware platforms with high speed and energy efficiency. Here, we first propose an extremely factorized network (EFNet) which can learn multi-scale context information while preserving rich spatial information with reduced model complexity. Experimental results on the Cityscapes dataset show that EFNet achieves an accuracy of 68.0% mean intersection over union (mIoU) with only 0.18M parameters, at a speed of 99 frames per second (FPS) on a single RTX 3090 GPU. Then, to further improve the speed and energy efficiency, we design a memristor-based computing-in-memory (CIM) accelerator for the hardware implementation of EFNet. It is shown by the simulation in DNN+NeuroSim V2.0 that the memristor-based CIM accelerator is 63× (4.6×) smaller in area, at most 9.2× (1000×) faster, and 470× (2400×) more energy-efficient than the RTX 3090 GPU (the Jetson Nano embedded development board), although its accuracy slightly decreases by 1.7% mIoU. Therefore, the memristor-based CIM accelerator has great potential to be deployed at the edge to implement lightweight semantic segmentation models like EFNet. This study showcases an algorithm-hardware co-design to realize real-time and low-power semantic segmentation at the edge.



中文翻译:

用于实时和低功耗语义分割的极分解网络的基于忆阻器的内存计算实现的性能估计

目前很多语义分割算法在冯诺依曼平台(如GPU)上已经取得了令人满意的精度,但速度和能耗还不能满足自动驾驶等某些边缘应用的高要求。为了解决这个问题,有必要设计一种高效的轻量级语义分割算法,然后在高速和节能的新兴硬件平台上实现它。在这里,我们首先提出了一个极端分解网络 (EFNet),它可以学习多尺度上下文信息,同时保留丰富的空间信息并降低模型复杂性。Cityscapes 数据集上的实验结果表明,EFNet 在单个 RTX 3090 GPU 上以每秒 99 帧 (FPS) 的速度实现了 68.0% 的平均交叉并集 (mIoU) 精度,仅需 0.18M 参数。然后,为了进一步提高速度和能效,我们为 EFNet 的硬件实现设计了一个基于忆阻器的内存计算 (CIM) 加速器。通过DNN+NeuroSim V2.0中的仿真表明,基于忆阻器的CIM加速器是~63×(~4.6×) 面积较小,至多~9.2×(~1000×) 更快,并且~470×(~2400×) 比 RTX 3090 GPU(Jetson Nano 嵌入式开发板)更节能,尽管其精度略有下降 1.7% mIoU。因此,基于忆阻器的 CIM 加速器有很大的潜力部署在边缘,以实现像 EFNet 这样的轻量级语义分割模型。本研究展示了一种算法-硬件协同设计,可在边缘实现实时和低功耗的语义分割。

更新日期:2023-01-17
down
wechat
bug