当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An adaptive DNN inference acceleration framework with end–edge–cloud collaborative computing
Future Generation Computer Systems ( IF 7.5 ) Pub Date : 2022-11-04 , DOI: 10.1016/j.future.2022.10.033
Guozhi Liu , Fei Dai , Xiaolong Xu , Xiaodong Fu , Wanchun Dou , Neeraj Kumar , Muhammad Bilal

Deep Neural Networks (DNNs) based on intelligent applications have been intensively deployed on mobile devices. Unfortunately, resource-constrained mobile devices cannot meet stringent latency requirements due to a large amount of computation required by these intelligent applications. Both exiting cloud-assisted DNN inference approaches and edge-assisted DNN inference approaches can reduce end-to-end inference latency through offloading DNN computations to the cloud server or edge servers, but they suffer from unpredictable communication latency caused by long wide-area massive data transmission or performance degeneration caused by the limited computation resources. In this paper, we propose an adaptive DNN inference acceleration framework, which accelerates DNN inference by fully utilizing the end–edge–cloud collaborative computing. First, a latency prediction model is built to estimate the layer-wise execution latency of a DNN on different heterogeneous computing platforms, which use neural networks to learn non-linear features related to inference latency. Second, a computation partitioning algorithm is designed to identify two optimal partitioning points, which adaptively divide DNN computations into end devices, edge servers, and the cloud server for minimizing DNN inference latency. Finally, we conduct extensive experiments on three widely-adopted DNNs, and the experimental results show that our latency prediction models can improve the prediction accuracy by about 72.31% on average compared with four baseline approaches, and our computation partitioning approach can reduce the end-to-end latency by about 20.81% on average against six baseline approaches under three wireless networks.



中文翻译:

具有端-边-云协同计算的自适应 DNN 推理加速框架

基于智能应用程序的深度神经网络 (DNN) 已被密集部署在移动设备上。不幸的是,由于这些智能应用程序需要大量计算,资源受限的移动设备无法满足严格的延迟要求。现有的云辅助 DNN 推理方法和边缘辅助 DNN 推理方法都可以通过将 DNN 计算卸载到云服务器或边缘服务器来减少端到端的推理延迟,但它们会受到长广域海量数据造成的不可预测的通信延迟的困扰。有限的计算资源导致的数据传输或性能下降。在本文中,我们提出了一种自适应 DNN 推理加速框架,该框架通过充分利用端-边-云协同计算来加速 DNN 推理。第一的,建立了一个延迟预测模型来估计 DNN 在不同异构计算平台上的逐层执行延迟,这些平台使用神经网络来学习与推理延迟相关的非线性特征。其次,设计了一种计算分区算法来确定两个最佳分区点,将 DNN 计算自适应地划分为终端设备、边缘服务器和云服务器,以最大限度地减少 DNN 推理延迟。最后,我们在三个广泛采用的 DNN 上进行了大量实验,实验结果表明,与四种基线方法相比,我们的延迟预测模型平均可以将预测精度提高约 72.31%,并且我们的计算分区方法可以减少端-到端延迟大约 20。

更新日期:2022-11-04
down
wechat
bug