当前位置: X-MOL 学术Proc. IEEE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review
Proceedings of the IEEE ( IF 23.2 ) Pub Date : 12-14-2022 , DOI: 10.1109/jproc.2022.3226481
Md. Maruf Hossain Shuvo 1 , Syed Kamrul Islam 1 , Jianlin Cheng 2 , Bashir I. Morshed 3
Affiliation  

Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted in breakthroughs in many areas. However, deploying these highly accurate models for data-driven, learned, automatic, and practical machine learning (ML) solutions to end-user applications remains challenging. DL algorithms are often computationally expensive, power-hungry, and require large memory to process complex and iterative operations of millions of parameters. Hence, training and inference of DL models are typically performed on high-performance computing (HPC) clusters in the cloud. Data transmission to the cloud results in high latency, round-trip delay, security and privacy concerns, and the inability of real-time decisions. Thus, processing on edge devices can significantly reduce cloud transmission cost. Edge devices are end devices closest to the user, such as mobile phones, cyber–physical systems (CPSs), wearables, the Internet of Things (IoT), embedded and autonomous systems, and intelligent sensors. These devices have limited memory, computing resources, and power-handling capability. Therefore, optimization techniques at both the hardware and software levels have been developed to handle the DL deployment efficiently on the edge. Understanding the existing research, challenges, and opportunities is fundamental to leveraging the next generation of edge devices with artificial intelligence (AI) capability. Mainly, four research directions have been pursued for efficient DL inference on edge devices: 1) novel DL architecture and algorithm design; 2) optimization of existing DL methods; 3) development of algorithm–hardware codesign; and 4) efficient accelerator design for DL deployment. This article focuses on surveying each of the four research directions, providing a comprehensive review of the state-of-the-art tools and techniques for efficient edge inference.

中文翻译:


资源受限边缘设备上深度学习推理的高效加速:综述



深度神经网络(DNN)或深度学习(DL)的成功集成在许多领域取得了突破。然而,将这些高度准确的模型部署到最终用户应用程序的数据驱动、学习、自动和实用的机器学习 (ML) 解决方案仍然具有挑战性。深度学习算法通常计算成本高、耗电大,并且需要大量内存来处理数百万个参数的复杂迭代操作。因此,深度学习模型的训练和推理通常在云中的高性能计算 (HPC) 集群上执行。数据传输到云端会导致高延迟、往返延迟、安全和隐私问题以及无法实时决策。因此,在边缘设备上进行处理可以显着降低云端传输成本。边缘设备是最接近用户的终端设备,例如移动电话、网络物理系统 (CPS)、可穿戴设备、物联网 (IoT)、嵌入式和自治系统以及智能传感器。这些设备的内存、计算资源和功率处理能力有限。因此,硬件和软件层面的优化技术都被开发出来,以有效地处理边缘的深度学习部署。了解现有的研究、挑战和机遇对于利用具有人工智能 (AI) 功能的下一代边缘设备至关重要。为了在边缘设备上进行高效的深度学习推理,主要有四个研究方向:1)新颖的深度学习架构和算法设计; 2)现有DL方法的优化; 3)算法开发——硬件协同设计; 4)用于深度学习部署的高效加速器设计。 本文重点调查四个研究方向,全面回顾用于高效边缘推理的最先进的工具和技术。
更新日期:2024-08-28
down
wechat
bug