当前位置: X-MOL 学术Concurr. Comput. Pract. Exp. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Evaluating and analyzing the energy efficiency of CNN inference on high‐performance GPU
Concurrency and Computation: Practice and Experience ( IF 1.5 ) Pub Date : 2020-10-21 , DOI: 10.1002/cpe.6064
Chunrong Yao 1 , Wantao Liu 2 , Weiqing Tang 1, 3 , Jinrong Guo 2 , Songlin Hu 2 , Yijun Lu 4 , Wei Jiang 5
Affiliation  

Convolutional neural network (CNN) inference usually runs on high‐performance graphic processing units (GPUs). Since GPU is a high power consumption unit, that makes the energy consumption increases sharply due to the deep learning tasks. The energy efficiency of CNN inference is not only related to the software and hardware configurations, but also closely related to the application requirements of inference tasks. However, it is not clear on GPUs at present. In this paper, we conduct a comprehensive study on the model‐level and layer‐level energy efficiency of popular CNN models. The results point out several opportunities for further optimization. We also analyze the parameter settings (i.e., batch size, dynamic voltage and frequency scaling) and propose a revenue model to allow an optimal trade‐off between energy efficiency and latency. Compared with the default settings, the optimal settings can improve revenue by up to 15.31×. We obtain the following main findings: (i) GPUs do not exploit the parallelism from the model depth and small convolution kernels, resulting in low energy efficiency. (ii) Convolutional layers are the most energy‐consuming CNN layers. However, due to the cache, the power consumption of all layers is relatively balanced. (iii) The energy efficiency of TensorRT is 1.53× than that of TensorFlow.

中文翻译:

评估和分析高性能GPU上CNN推理的能效

卷积神经网络(CNN)推理通常在高性能图形处理单元(GPU)上运行。由于GPU是高功耗单元,因此由于深度学习任务,能耗会急剧增加。CNN推理的能效不仅与软件和硬件配置有关,而且与推理任务的应用需求密切相关。但是,目前在GPU上尚不清楚。在本文中,我们对流行的CNN模型的模型级和层级能效进行了全面研究。结果指出了进一步优化的若干机会。我们还分析了参数设置(即批大小,动态电压和频率缩放),并提出了一种收益模型,以实现能源效率和延迟之间的最佳折衷。×。我们获得以下主要发现:(i)GPU无法利用模型深度和小的卷积内核的并行性,从而导致能源效率低下。(ii)卷积层是最耗能的CNN层。但是,由于存在缓存,所有层的功耗都相对平衡。(iii)TensorRT的能源效率是TensorFlow的1.53 ×
更新日期:2020-10-21
down
wechat
bug