Joint Anchor-Feature Refinement for Real-Time Accurate Object Detection in Images and Videos,IEEE Transactions on Circuits and Systems for Video Technology

当前位置： X-MOL 学术 › IEEE Trans. Circ. Syst. Video Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Joint Anchor-Feature Refinement for Real-Time Accurate Object Detection in Images and Videos
IEEE Transactions on Circuits and Systems for Video Technology ( IF 8.4 ) Pub Date : 2021-02-01 , DOI: 10.1109/tcsvt.2020.2980876
Xingyu Chen , Junzhi Yu , Shihan Kong , Zhengxing Wu , Li Wen

Object detection has been vigorously investigated for years but fast accurate detection for real-world scenes remains a very challenging problem. Overcoming drawbacks of single-stage detectors, we take aim at precisely detecting objects for static and temporal scenes in real time. Firstly, as a dual refinement mechanism, a novel anchor-offset detection is designed, which includes an anchor refinement, a feature location refinement, and a deformable detection head. This new detection mode is able to simultaneously perform two-step regression and capture accurate object features. Based on the anchor-offset detection, a dual refinement network (DRNet) is developed for high-performance static detection, where a multi-deformable head is further designed to leverage contextual information for describing objects. As for temporal detection in videos, temporal refinement networks (TRNet) and temporal dual refinement networks (TDRNet) are developed by propagating the refinement information across time. We also propose a soft refinement strategy to temporally match object motion with the previous refinement. Our proposed methods are evaluated on PASCAL VOC, COCO, and ImageNet VID datasets. Extensive comparisons on static and temporal detection verify the superiority of DRNet, TRNet, and TDRNet. Consequently, our developed approaches run in a fairly fast speed, and in the meantime achieve a significantly enhanced detection accuracy, i.e., 84.4% mAP on VOC 2007, 83.6% mAP on VOC 2012, 69.4% mAP on VID 2017, and 42.4% AP on COCO. Ultimately, producing encouraging results, our methods are applied to online underwater object detection and grasping with an autonomous system. Codes are publicly available at this https URL.

中文翻译：

用于图像和视频中实时准确目标检测的联合锚点特征细化

多年来，一直在大力研究对象检测，但对真实世界场景的快速准确检测仍然是一个非常具有挑战性的问题。克服单级检测器的缺点，我们的目标是实时精确检测静态和时间场景的对象。首先，作为双重细化机制，设计了一种新颖的anchor-offset检测，包括anchor细化、特征位置细化和可变形检测头。这种新的检测模式能够同时执行两步回归并捕获准确的对象特征。基于锚点偏移检测，开发了一个双重细化网络 (DRNet) 用于高性能静态检测，其中进一步设计了多可变形头部以利用上下文信息来描述对象。至于视频中的时间检测，时间细化网络 (TRNet) 和时间双重细化网络 (TDRNet) 是通过跨时间传播细化信息来开发的。我们还提出了一种软细化策略，以在时间上将对象运动与先前的细化相匹配。我们提出的方法在 PASCAL VOC、COCO 和 ImageNet VID 数据集上进行了评估。对静态和时间检测的广泛比较验证了 DRNet、TRNet 和 TDRNet 的优越性。因此，我们开发的方法以相当快的速度运行，同时实现了显着提高的检测精度，即 VOC 2007 上的 mAP 为 84.4%，VOC 2012 上的 mAP 为 83.6%，VID 2017 上的 mAP 为 69.4%，AP 为 42.4%在可可上。最终，产生令人鼓舞的结果，我们的方法应用于在线水下物体检测和自动系统抓取。

更新日期：2021-02-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>