当前位置: X-MOL 学术ACM Trans. Embed. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DEEPEYE
ACM Transactions on Embedded Computing Systems ( IF 2.8 ) Pub Date : 2020-05-25 , DOI: 10.1145/3381805
Yuan Cheng 1 , Guangya Li 2 , Ngai Wong 3 , Hai-Bao Chen 1 , Hao Yu 2
Affiliation  

Video object detection and action recognition typically require deep neural networks (DNNs) with huge number of parameters. It is thereby challenging to develop a DNN video comprehension unit in resource-constrained terminal devices. In this article, we introduce a deeply tensor-compressed video comprehension neural network, called DEEPEYE, for inference on terminal devices. Instead of building a Long Short-Term Memory (LSTM) network directly from high-dimensional raw video data input, we construct an LSTM-based spatio-temporal model from structured, tensorized time-series features for object detection and action recognition. A deep compression is achieved by tensor decomposition and trained quantization of the time-series feature-based LSTM network. We have implemented DEEPEYE on an ARM-core-based IOT board with 31 FPS consuming only 2.4W power. Using the video datasets MOMENTS, UCF11 and HMDB51 as benchmarks, DEEPEYE achieves a 228.1× model compression with only 0.47% mAP reduction; as well as 15 k × parameter reduction with up to 8.01% accuracy improvement over other competing approaches.

中文翻译:

深眼

视频对象检测和动作识别通常需要具有大量参数的深度神经网络 (DNN)。因此,在资源受限的终端设备中开发 DNN 视频理解单元具有挑战性。在本文中,我们介绍了一种深度张量压缩的视频理解神经网络,称为 DEEPEYE,用于在终端设备上进行推理。我们不是直接从高维原始视频数据输入构建长短期记忆 (LSTM) 网络,而是从结构化的张量时间序列特征构建基于 LSTM 的时空模型,用于对象检测和动作识别。通过张量分解和基于时间序列特征的 LSTM 网络的训练量化来实现深度压缩。我们在基于 ARM 内核的 IOT 板上实现了 DEEPEYE,速度为 31 FPS,功耗仅为 2.4W。以视频数据集 MOMENTS、UCF11 和 HMDB51 为基准,DEEPEYE 实现了 228.1 倍的模型压缩,仅减少了 0.47% 的 mAP;以及 15ķ× 与其他竞争方法相比,参数减少高达 8.01%。
更新日期:2020-05-25
down
wechat
bug