当前位置: X-MOL 学术IEEE Commun. Mag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Communication-Computation Trade-off in Resource-Constrained Edge Inference
IEEE Communications Magazine ( IF 11.2 ) Pub Date : 2020-12-01 , DOI: 10.1109/mcom.001.2000373
Jiawei Shao , Jun Zhang

The recent breakthrough in artificial intelligence (AI), especially deep neural networks (DNNs), has affected every branch of science and technology. Particularly, edge AI has been envisioned as a major application scenario to provide DNN-based services at edge devices. This article presents effective methods for edge inference at resource-constrained devices. It focuses on device-edge co-inference, assisted by an edge computing server, and investigates a critical trade-off among the computational cost of the on-device model and the communication overhead of forwarding the intermediate feature to the edge server. A general three-step framework is proposed for the effective inference: model split point selection to determine the on-device model, communication-aware model compression to reduce the on-device computation and the resulting communication overhead simultaneously, and task-oriented encoding of the intermediate feature to further reduce the communication overhead. Experiments demonstrate that our proposed framework achieves a better tradeoff and significantly reduces the inference latency than baseline methods.

中文翻译:

资源受限边缘推理中的通信-计算权衡

最近人工智能 (AI),尤其是深度神经网络 (DNN) 的突破已经影响到科学和技术的各个分支。特别是,边缘 AI 已被设想为在边缘设备上提供基于 DNN 的服务的主要应用场景。本文介绍了在资源受限设备上进行边缘推理的有效方法。它侧重于设备边缘协同推理,由边缘计算服务器辅助,并研究了设备上模型的计算成本与将中间特征转发到边缘服务器的通信开销之间的关键权衡。为有效推理提出了一个通用的三步框架:模型分割点选择以确定设备上模型,通信感知模型压缩以同时减少设备上的计算和由此产生的通信开销,以及面向任务的中间特征编码以进一步减少通信开销。实验表明,与基线方法相比,我们提出的框架实现了更好的权衡并显着降低了推理延迟。
更新日期:2020-12-01
down
wechat
bug