High Performance Depthwise and Pointwise Convolutions on Mobile Devices,arXiv - CS - Performance

当前位置： X-MOL 学术 › arXiv.cs.PF › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

High Performance Depthwise and Pointwise Convolutions on Mobile Devices
arXiv - CS - Performance Pub Date : 2020-01-03 , DOI: arxiv-2001.02504
Pengfei Zhang, Eric Lo, Baotong Lu

Lightweight convolutional neural networks (e.g., MobileNets) are specifically designed to carry out inference directly on mobile devices. Among the various lightweight models, depthwise convolution (DWConv) and pointwise convolution (PWConv) are their key operations. In this paper, we observe that the existing implementations of DWConv and PWConv are not well utilizing the ARM processors in the mobile devices, and exhibit lots of cache misses under multi-core and poor data reuse at register level. We propose techniques to re-optimize the implementations of DWConv and PWConv based on ARM architecture. Experimental results show that our implementation can respectively achieve a speedup of up to 5.5x and 2.1x against TVM (Chen et al. 2018) on DWConv and PWConv.

中文翻译：

移动设备上的高性能深度和逐点卷积

轻量级卷积神经网络（例如 MobileNets）专门设计用于直接在移动设备上进行推理。在各种轻量级模型中，深度卷积（DWConv）和逐点卷积（PWConv）是它们的关键操作。在本文中，我们观察到 DWConv 和 PWConv 的现有实现不能很好地利用移动设备中的 ARM 处理器，并且在多核下表现出大量缓存未命中和寄存器级数据重用不佳。我们提出了重新优化基于 ARM 架构的 DWConv 和 PWConv 实现的技术。实验结果表明，我们的实现可以在 DWConv 和 PWConv 上分别实现高达 5.5 倍和 2.1 倍的加速比 TVM (Chen et al. 2018)。

更新日期：2020-01-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文