Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection,International Journal of Computer Vision

当前位置： X-MOL 学术 › Int. J. Comput. Vis. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2018-06-20 , DOI: 10.1007/s11263-018-1101-7
Hongyang Li , Yu Liu , Wanli Ouyang , Xiaogang Wang

In this paper, we propose a zoom-out-and-in network for generating object proposals. A key observation is that it is difficult to classify anchors of different sizes with the same set of features. Anchors of different sizes should be placed accordingly based on different depth within a network: smaller boxes on high-resolution layers with a smaller stride while larger boxes on low-resolution counterparts with a larger stride. Inspired by the conv/deconv structure, we fully leverage the low-level local details and high-level regional semantics from two feature map streams, which are complimentary to each other, to identify the objectness in an image. A map attention decision (MAD) unit is further proposed to aggressively search for neuron activations among two streams and attend the most contributive ones on the feature learning of the final loss. The unit serves as a decision-maker to adaptively activate maps along certain channels with the solely purpose of optimizing the overall training loss. One advantage of MAD is that the learned weights enforced on each feature channel is predicted on-the-fly based on the input context, which is more suitable than the fixed enforcement of a convolutional kernel. Experimental results on three datasets demonstrate the effectiveness of our proposed algorithm over other state-of-the-arts, in terms of average recall for region proposal and average precision for object detection.

中文翻译：

用于区域提议和对象检测的具有地图注意决策的放大和缩小网络

在本文中，我们提出了一种用于生成对象提议的放大和缩小网络。一个关键的观察结果是很难对具有相同特征集的不同大小的锚进行分类。应根据网络中的不同深度相应地放置不同大小的锚点：高分辨率层上较小的框，步幅较小，而低分辨率层上的较大框则步幅较大。受 conv/deconv 结构的启发，我们充分利用来自两个相互补充的特征映射流的低级局部细节和高级区域语义来识别图像中的对象。进一步提出了地图注意决策（MAD）单元，以积极搜索两个流中的神经元激活，并参与对最终损失的特征学习贡献最大的那些。该单元作为决策者，沿着某些通道自适应地激活地图，其唯一目的是优化整体训练损失。MAD 的一个优点是在每个特征通道上实施的学习权重是根据输入上下文即时预测的，这比卷积核的固定实施更合适。在三个数据集上的实验结果证明了我们提出的算法在区域提议的平均召回率和目标检测的平均精度方面优于其他最新技术的有效性。MAD 的一个优点是在每个特征通道上实施的学习权重是根据输入上下文即时预测的，这比卷积核的固定实施更合适。在三个数据集上的实验结果证明了我们提出的算法在区域提议的平均召回率和目标检测的平均精度方面优于其他最新技术的有效性。MAD 的一个优点是在每个特征通道上实施的学习权重是根据输入上下文即时预测的，这比卷积核的固定实施更合适。在三个数据集上的实验结果证明了我们提出的算法在区域提议的平均召回率和目标检测的平均精度方面优于其他最新技术的有效性。

更新日期：2018-06-20

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11