DEEPEYE,ACM Transactions on Embedded Computing Systems

当前位置： X-MOL 学术 › ACM Trans. Embed. Comput. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DEEPEYE
ACM Transactions on Embedded Computing Systems ( IF 2.8 ) Pub Date : 2020-05-25 , DOI: 10.1145/3381805
Yuan Cheng ₁ , Guangya Li ₂ , Ngai Wong ₃ , Hai-Bao Chen ₁ , Hao Yu ₂

Affiliation

Video object detection and action recognition typically require deep neural networks (DNNs) with huge number of parameters. It is thereby challenging to develop a DNN video comprehension unit in resource-constrained terminal devices. In this article, we introduce a deeply tensor-compressed video comprehension neural network, called DEEPEYE, for inference on terminal devices. Instead of building a Long Short-Term Memory (LSTM) network directly from high-dimensional raw video data input, we construct an LSTM-based spatio-temporal model from structured, tensorized time-series features for object detection and action recognition. A deep compression is achieved by tensor decomposition and trained quantization of the time-series feature-based LSTM network. We have implemented DEEPEYE on an ARM-core-based IOT board with 31 FPS consuming only 2.4W power. Using the video datasets MOMENTS, UCF11 and HMDB51 as benchmarks, DEEPEYE achieves a 228.1× model compression with only 0.47% mAP reduction; as well as 15 k × parameter reduction with up to 8.01% accuracy improvement over other competing approaches.

中文翻译：

深眼

视频对象检测和动作识别通常需要具有大量参数的深度神经网络 (DNN)。因此，在资源受限的终端设备中开发 DNN 视频理解单元具有挑战性。在本文中，我们介绍了一种深度张量压缩的视频理解神经网络，称为 DEEPEYE，用于在终端设备上进行推理。我们不是直接从高维原始视频数据输入构建长短期记忆 (LSTM) 网络，而是从结构化的张量时间序列特征构建基于 LSTM 的时空模型，用于对象检测和动作识别。通过张量分解和基于时间序列特征的 LSTM 网络的训练量化来实现深度压缩。我们在基于 ARM 内核的 IOT 板上实现了 DEEPEYE，速度为 31 FPS，功耗仅为 2.4W。以视频数据集 MOMENTS、UCF11 和 HMDB51 为基准，DEEPEYE 实现了 228.1 倍的模型压缩，仅减少了 0.47% 的 mAP；以及 15ķ× 与其他竞争方法相比，参数减少高达 8.01%。

更新日期：2020-05-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11