当前位置: X-MOL 学术Int. J. Parallel. Program › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices
International Journal of Parallel Programming ( IF 1.5 ) Pub Date : 2021-04-07 , DOI: 10.1007/s10766-021-00712-3
Rafael Stahl , Alexander Hoffman , Daniel Mueller-Gritschneder , Andreas Gerstlauer , Ulf Schlichtmann

Performing inference of Convolutional Neural Networks (CNNs) on Internet of Things (IoT) edge devices ensures both privacy of input data and possible run time reductions when compared to a cloud solution. As most edge devices are memory- and compute-constrained, they cannot store and execute complex CNNs. Partitioning and distributing layer information across multiple edge devices to reduce the amount of computation and data on each device presents a solution to this problem. In this article, we propose DeeperThings, an approach that supports a full distribution of CNN inference tasks by partitioning fully-connected as well as both feature- and weight-intensive convolutional layers. Additionally, we jointly optimize memory, computation and communication demands. This is achieved using techniques to combine both feature and weight partitioning with a communication-aware layer fusion method, enabling holistic optimization across layers. For a given number of edge devices, the schemes are applied jointly using Integer Linear Programming (ILP) formulations to minimize data exchanged between devices, to optimize run times and to find the entire model’s minimal memory footprint. Experimental results from a real-world hardware setup running four different CNN models confirm that the scheme is able to evenly balance the memory footprint between devices. For six devices on 100 Mbit/s connections the integration of layer fusion additionally leads to a reduction of communication demands by up to 28.8%. This results in run time speed-up of the inference task by up to 1.52x compared to layer partitioning without fusing.



中文翻译:

DeeperThings:在资源受限的边缘设备上完全分布式的CNN推论

与云解决方案相比,在物联网(IoT)边缘设备上执行卷积神经网络(CNN)推理可确保输入数据的私密性和可能的​​运行时间减少。由于大多数边缘设备受内存和计算限制,因此它们无法存储和执行复杂的CNN。跨多个边缘设备分区和分布层信息以减少每个设备上的计算量和数据量提出了解决此问题的方法。在本文中,我们提出了DeeperThings,该方法通过对完全连接的以及功能密集的卷积层和权重密集的卷积层进行分区来支持CNN推理任务的完整分发。此外,我们共同优化内存,计算和通信需求。这是通过将特征和权重划分与通信感知层融合方法相结合的技术来实现的,从而实现了跨层的整体优化。对于给定数量的边缘设备,使用整数线性编程(ILP)公式共同应用这些方案,以最大程度地减少设备之间交换的数据,优化运行时间并查找整个模型的最小内存占用量。来自运行四个不同CNN模型的真实硬件设置的实验结果证实,该方案能够均衡地平衡设备之间的内存占用。对于100 Mbit / s连接上的六个设备,层融合的集成还导致通信需求最多降低28.8%。这样可以将推理任务的运行时加速最多提高1。

更新日期:2021-04-08
down
wechat
bug