A feature temporal attention based interleaved network for fast video object detection,Journal of Ambient Intelligence and Humanized Computing

当前位置： X-MOL 学术 › J. Ambient Intell. Human. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A feature temporal attention based interleaved network for fast video object detection
Journal of Ambient Intelligence and Humanized Computing ( IF 3.662 ) Pub Date : 2021-05-11 , DOI: 10.1007/s12652-021-03309-3
Yanni Yang , Huansheng Song , Shijie Sun , Yan Chen , Xinyao Tang , Qin Shi

Object detection in videos is a fundamental technology for applications such as monitoring. Since video frames are treated as independent input images, static detectors ignore the temporal information of objects when detecting objects in videos, generating redundant calculations in the detection process. In this paper, based on the spatiotemporal continuity of video objects, we propose an attention-guided dynamic video object detection method for fast detection. We define two frame attributes as key frame and non-key frame, then extract complete or shallow features, respectively. Distinct from the fixed key frame strategy used in previous studies, by measuring the feature similarity between frames, we develop a new key frame decision method to adaptively determine the attributes of the current frame. For the extracted shallow features of non-key frames, semantic enhancement and feature temporal attention (FTA) based feature propagation are performed to generate high-level semantic features in the designed temporal attention based feature propagation module (TAFPM). Our method is evaluated on the ImageNet VID dataset. It runs at the speed of 21.53 fps, which is twice the speed of the base detector R-FCN. The mAP decline is only 0.2% compared to R-FCN. Effectively, the proposed method achieves comparable performance with the state-of-the-arts which focus on speed.

中文翻译：

基于特征时间关注的交错网络，用于快速视频对象检测

视频中的对象检测是监视等应用程序的一项基本技术。由于视频帧被视为独立的输入图像，因此静态检测器在检测视频中的对象时会忽略对象的时间信息，从而在检测过程中产生冗余计算。本文基于视频对象的时空连续性，提出了一种注意力导向的动态视频对象检测方法，用于快速检测。我们定义两个帧属性为关键帧和非关键帧，然后分别提取完整或浅层特征。与先前研究中使用的固定关键帧策略不同，通过测量帧之间的特征相似性，我们开发了一种新的关键帧决策方法来自适应地确定当前帧的属性。对于所提取的非关键帧的浅层特征，在设计的基于时间注意的特征传播模块（TAFPM）中执行基于语义增强和基于特征时间注意（FTA）的特征传播，以生成高级语义特征。我们的方法是在ImageNet VID数据集上评估的。它以21.53 fps的速度运行，这是基本检测器R-FCN的速度的两倍。与R-FCN相比，mAP下降仅为0.2％。有效地，所提出的方法与专注于速度的最新技术达到了可比的性能。它以21.53 fps的速度运行，这是基本检测器R-FCN的速度的两倍。与R-FCN相比，mAP下降仅为0.2％。有效地，所提出的方法与专注于速度的最新技术达到了可比的性能。它以21.53 fps的速度运行，这是基本检测器R-FCN的速度的两倍。与R-FCN相比，mAP下降仅为0.2％。有效地，所提出的方法与专注于速度的最新技术达到了可比的性能。

更新日期：2021-05-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>