当前位置: X-MOL 学术IEEE Internet Things J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automated Exploration and Implementation of Distributed CNN Inference at the Edge
IEEE Internet of Things Journal ( IF 10.6 ) Pub Date : 2023-01-17 , DOI: 10.1109/jiot.2023.3237572
Xiaotian Guo 1 , Andy D. Pimentel 2 , Todor Stefanov 3
Affiliation  

For model inference of convolutional neural networks (CNNs), we nowadays witness a shift from the Cloud to the Edge. Unfortunately, deploying and inferring large, compute- and memory-intensive CNNs on Internet of Things devices at the Edge is challenging as they typically have limited resources. One approach to address this challenge is to leverage all available resources across multiple edge devices to execute a large CNN by properly partitioning it and running each CNN partition on a separate edge device. However, there currently does not exist a design and programming framework that takes a trained CNN model as input and subsequently allows for efficiently exploring and automatically implementing a range of different CNN partitions on multiple edge devices to facilitate distributed CNN inference. Therefore, in this article, we propose a novel framework that automates the splitting of a CNN model into a set of submodels as well as the code generation needed for the distributed and collaborative execution of these submodels on multiple, possibly heterogeneous, edge devices, while supporting the exploitation of parallelism among and within the edge devices. In addition, since the number of different CNN mapping possibilities on multiple edge devices is vast, our framework also features a multistage and hierarchical design space exploration methodology to efficiently search for (near-)optimal distributed CNN inference implementations. Our experimental results demonstrate that our work allows for rapidly finding and realizing distributed CNN inference implementations with reduced energy consumption and memory usage per edge device, and under certain conditions, with improved system throughput as well.

中文翻译:

边缘分布式 CNN 推理的自动化探索和实现

对于卷积神经网络 (CNN) 的模型推理,我们现在见证了从云端到边缘的转变。不幸的是,在边缘的物联网设备上部署和推断大型计算和内存密集型 CNN 具有挑战性,因为它们通常资源有限。解决这一挑战的一种方法是利用跨多个边缘设备的所有可用资源来执行大型 CNN,方法是对其进行适当分区并在单独的边缘设备上运行每个 CNN 分区。然而,目前不存在一种设计和编程框架,可以将经过训练的 CNN 模型作为输入,并随后允许在多个边缘设备上有效地探索和自动实现一系列不同的 CNN 分区,以促进分布式 CNN 推理。因此,在这篇文章中,我们提出了一个新颖的框架,可以自动将 CNN 模型拆分为一组子模型,以及在多个可能异构的边缘设备上分布式和协作执行这些子模型所需的代码生成,同时支持利用并行性在边缘设备中。此外,由于多个边缘设备上的不同 CNN 映射可能性数量巨大,我们的框架还具有多阶段和分层设计空间探索方法,可以有效地搜索(接近)最优的分布式 CNN 推理实现。我们的实验结果表明,我们的工作允许快速找到并实现分布式 CNN 推理实现,同时降低每个边缘设备的能耗和内存使用量,并且在某些条件下,
更新日期:2023-01-17
down
wechat
bug