当前位置: X-MOL 学术ACM Trans. Multimed. Comput. Commun. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
FIN
ACM Transactions on Multimedia Computing, Communications, and Applications ( IF 5.2 ) Pub Date : 2020-05-25 , DOI: 10.1145/3381086
Xiaofan Luo 1 , Fukoeng Wong 1 , Haifeng Hu 1
Affiliation  

Multi-layer detection is a widely used method in the field of object detection. It extracts multiple feature maps with different resolutions from the backbone network to detect objects of different scales, which can effectively cope with the problem of object scale change in object detection. Although the multi-layer detection utilizes multiple detection layers to alleviate the burden of one single detection layer and can improve the detection accuracy to some extent, this method has two limitations. First, manually assigning anchor boxes of different sizes to different feature maps is too dependent on the human experience. Second, there is a semantic gap between each detection layer in multi-layer detection. The same detector needs to simultaneously process the detection layers with inconsistent semantic strength, which increases the optimization difficulty of the detector. In this article, we propose a feature integrated network (FIN) based on single layer detection to deal with the problems mentioned above. Different from the existing methods, we design a series of verification experiments based on the multi-layer detection model, which shows that the shallow high-resolution feature map has the potential to simultaneously and effectively detect objects of various scales. Considering that the semantic information of the shallow feature map is weak, we propose two modules to enhance the representation ability of the single detection layer. First, we propose a detection adaptation network (DANet) to extract powerful feature maps that are useful for object detection tasks. Second, we combine global context information and local detail information with a verified hourglass module (VHM) to generate a single feature map with high resolution and rich semantic information so that we can assign all anchor boxes to this detection layer. In our model, all the detection operations are concentrated on a high-resolution feature map whose semantic information and detailed information are enhanced as much as possible. Therefore, the proposed model can solve the problem of anchor assignment and inconsistent semantic strength between multiple detection layers mentioned above. A large number of experiments on the Pattern Analysis, Statistical Modelling and Computational Learning Visual Object Classes (PASCAL VOC) and Microsoft Common Objects in Context (MS COCO) datasets show that our model has good detection performance for objects of various sizes. The proposed model can achieve<?brk?> 81.9 mAP when the size of the input image is 300 × 300.

中文翻译:



多层检测是目标检测领域广泛使用的一种方法。它从主干网络中提取多个不同分辨率的特征图来检测不同尺度的物体,可以有效应对物体检测中物体尺度变化的问题。虽然多层检测利用多个检测层来减轻单个检测层的负担,并且可以在一定程度上提高检测精度,但这种方法有两个局限性。首先,手动将不同大小的锚框分配给不同的特征图过于依赖人类经验。其次,多层检测中每个检测层之间存在语义差距。同一个检测器需要同时处理语义强度不一致的检测层,增加了检测器的优化难度。在本文中,我们提出了一种基于单层检测的特征集成网络(FIN)来处理上述问题。与现有方法不同,我们基于多层检测模型设计了一系列验证实验,表明浅层高分辨率特征图具有同时有效检测各种尺度物体的潜力。考虑到浅层特征图的语义信息较弱,我们提出了两个模块来增强单个检测层的表示能力。首先,我们提出了一个检测适应网络(DANet)来提取对目标检测任务有用的强大特征图。第二,我们将全局上下文信息和局部细节信息与经过验证的沙漏模块(VHM)相结合,生成具有高分辨率和丰富语义信息的单个特征图,以便我们可以将所有锚框分配给该检测层。在我们的模型中,所有的检测操作都集中在一个高分辨率的特征图上,其语义信息和详细信息被尽可能地增强。因此,所提出的模型可以解决上述多个检测层之间的anchor assignment和语义强度不一致的问题。大量关于模式分析的实验,统计建模和计算学习视觉对象类 (PASCAL VOC) 和 Microsoft Common Objects in Context (MS COCO) 数据集表明,我们的模型对各种大小的对象具有良好的检测性能。当输入图像的大小为 300 × 300 时,所提出的模型可以达到 <?brk?> 81.9 mAP。
更新日期:2020-05-25
down
wechat
bug