Memory-efficient deep learning inference with incremental weight loading and data layout reorganization on edge systems,Journal of Systems Architecture

当前位置： X-MOL 学术 › J. Syst. Archit. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Memory-efficient deep learning inference with incremental weight loading and data layout reorganization on edge systems
Journal of Systems Architecture ( IF 3.7 ) Pub Date : 2021-05-20 , DOI: 10.1016/j.sysarc.2021.102183
Cheng Ji , Fan Wu , Zongwei Zhu , Li-pin Chang , Huanghe Liu , Wenjie Zhai

Pattern recognition applications such as face recognition and agricultural product detection have drawn a rapid interest on Cyber–Physical–Social-Systems (CPSS). These CPSS applications rely on the deep neural networks (DNN) to conduct the image classification. However, traditional DNN inference models in the cloud could suffer from network delay fluctuations and privacy leakage problems. In this regard, current real-time CPSS applications are preferred to be deployed on edge-end embedded devices. Constrained by the computing power and memory limitations of edge devices, improving the memory management efficacy is the key to improving the quality of service for model inference. First, this study explored the incremental loading strategy of model weights for the model inference. Second, the memory space at runtime is optimized through data layout reorganization from the spatial dimension. In particular, the proposed schemes are orthogonal to existing models. Experimental results demonstrate that the proposed approach reduced the memory consumption by 61.05% without additional inference time overhead.

中文翻译：

内存有效的深度学习推理，可在边缘系统上增加权重并重组数据布局

模式识别应用（例如面部识别和农产品检测）引起了人们对网络物理社会系统（CPSS）的迅速关注。这些CPSS应用程序依赖于深度神经网络（DNN）进行图像分类。但是，云中的传统DNN推理模型可能会遭受网络延迟波动和隐私泄漏问题的困扰。在这方面，当前的实时CPSS应用程序最好部署在边缘端嵌入式设备上。受边缘设备的计算能力和内存限制的约束，提高内存管理效率是提高模型推断服务质量的关键。首先，本研究探索了模型权重的增量加载策略，以进行模型推断。第二，通过从空间维度对数据布局进行重组，可以优化运行时的内存空间。特别地，所提出的方案与现有模型正交。实验结果表明，该方法在不增加推理时间开销的情况下，将内存消耗降低了61.05％。

更新日期：2021-05-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11