当前位置: X-MOL 学术Eng. Appl. Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
STDnet: Exploiting high resolution feature maps for small object detection
Engineering Applications of Artificial Intelligence ( IF 8 ) Pub Date : 2020-03-25 , DOI: 10.1016/j.engappai.2020.103615
Brais Bosquet , Manuel Mucientes , Víctor M. Brea

The accuracy of small object detection with convolutional neural networks (ConvNets) lags behind that of larger objects. This can be observed in popular contests like MS COCO. This is in part caused by the lack of specific architectures and datasets with a sufficiently large number of small objects. Our work aims at these two issues. First, this paper introduces STDnet, a convolutional neural network focused on the detection of small objects that we defined as those under 16 × 16 pixels. The high performance of STDnet is built on a novel early visual attention mechanism, called Region Context Network (RCN), to choose the most promising regions, while discarding the rest of the input image. Processing only specific areas allows STDnet to keep high resolution feature maps in deeper layers providing low memory overhead and higher frame rates. High resolution feature maps were proved to be key to increasing localization accuracy in such small objects. Second, we also present USC-GRAD-STDdb, a video dataset with more than 56,000 annotated small objects in challenging scenarios. Experimental results over USC-GRAD-STDdb show that STDnet improves the AP@.5 of the best state-of-the-art object detectors for small target detection from 50.8% to 57.4%. Performance has also been tested in MS COCO for objects under 16 × 16 pixels. In addition, a spatio-temporal baseline network, STDnet-bST, has been proposed to make use of the information of successive frames, increasing the AP@.5 of STDnet in 2.3%. Finally, optimizations have been carried out to be fit on embedded devices such as Jetson TX2.



中文翻译:

STDnet:利用高分辨率特征图进行小物体检测

使用卷积神经网络(ConvNets)进行小物体检测的准确性落后于大物体。可以在MS COCO等热门比赛中看到这一点。部分原因是缺乏具有足够大量小对象的特定体系结构和数据集。我们的工作针对这两个问题。首先,本文介绍了STDnet,这是一种卷积神经网络,专注于检测我们定义为16×16像素以下的小物体。STDnet的高性能建立在一种新颖的早期视觉注意机制上,该机制称为Region Context Network(RCN),可以选择最有希望的区域,同时丢弃其余的输入图像。仅处理特定区域,STDnet可以将高分辨率功能映射保留在更深的层中,从而提供较低的内存开销和较高的帧速率。事实证明,高分辨率特征图对于提高此类小型物体的定位精度至关重要。其次,我们还展示了USC-GRAD-STDdb,这是一个在挑战性场景中包含超过56,000个带注释的小对象的视频数据集。在USC-GRAD-STDdb上进行的实验结果表明,STDnet可以改善AP@5最好的最先进物体检测器,可将小目标检测率从50.8%降低到57.4%。在MS COCO中还针对16×16像素以下的对象测试了性能。另外,提出了时空基线网络STDnet-bST,以利用连续帧的信息,从而增加AP@5STDnet的比例为2.3%。最终,已经进行了优化以适合嵌入式设备(例如Jetson TX2)。

更新日期:2020-03-25
down
wechat
bug