Deep Learning Inference with Dynamic Graphs on Heterogeneous Platforms,International Journal of Parallel Programming

当前位置： X-MOL 学术 › Int. J. Parallel. Program › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep Learning Inference with Dynamic Graphs on Heterogeneous Platforms
International Journal of Parallel Programming ( IF 0.9 ) Pub Date : 2020-02-12 , DOI: 10.1007/s10766-020-00654-2
V. Pothos , E. Vassalos , I. Theodorakopoulos , N. Fragoulis

One major drawback of deep-learning algorithms is the elevated cost of computing complexity and memory bandwidth required for inference. In order to ameliorate these costs in applications that utilize Convolutional Neural Networks (CNNs), a new, radical, approach is the dynamic pruning of kernels which aims to the parsimonious inference by learning to exploit and dynamically remove the redundant capacity of a CNN architecture. This conditional execution approach formulates a systematic and data-driven method for developing CNNs that are trained to eventually change size and form in real-time during inference, targeting to the smaller possible computational footprint. The conditional execution however, induces a number of challenges when it comes to the implementation of these algorithms to embedded systems. In this paper we present a systematic way of deploying this new dynamic pruning methodology, in heterogeneous platforms that facilitate both CPU and GPU subsystems. Realtime measurements of embedded implementations in modern SoCs verify the efficacy of the proposed methodology and demonstrate the ability of the dynamic networks to both adapt their size to the complexity of the task and deliver significant computational gains during inference.

中文翻译：

在异构平台上使用动态图进行深度学习推理

深度学习算法的一个主要缺点是计算复杂性和推理所需的内存带宽成本升高。为了在利用卷积神经网络 (CNN) 的应用程序中降低这些成本，一种新的、激进的方法是动态修剪内核，旨在通过学习利用和动态移除 CNN 架构的冗余容量来进行简约推理。这种条件执行方法制定了一种系统的、数据驱动的方法来开发 CNN，这些 CNN 被训练为最终在推理过程中实时改变大小和形式，目标是尽可能减少计算占用空间。 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 然而，在将这些算法实现到嵌入式系统时，条件执行会带来许多挑战。在本文中，我们提出了一种在促进 CPU 和 GPU 子系统的异构平台中部署这种新的动态修剪方法的系统方法。现代 SoC 中嵌入式实现的实时测量验证了所提出方法的有效性，并展示了动态网络使其大小适应任务复杂性并在推理过程中提供显着计算增益的能力。

更新日期：2020-02-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11