当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Crafting GBD-Net for Object Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2017-08-29 , DOI: 10.1109/tpami.2017.2745563
Xingyu Zeng , Wanli Ouyang , Junjie Yan , Hongsheng Li , Tong Xiao , Kun Wang , Yu Liu , Yucong Zhou , Bin Yang , Zhe Wang , Hui Zhou , Xiaogang Wang

The visual cues from multiple support regions of different sizes and resolutions are complementary in classifying a candidate box in object detection. Effective integration of local and contextual visual cues from these regions has become a fundamental problem in object detection. In this paper, we propose a gated bi-directional CNN (GBD-Net) to pass messages among features from different support regions during both feature learning and feature extraction. Such message passing can be implemented through convolution between neighboring support regions in two directions and can be conducted in various layers. Therefore, local and contextual visual patterns can validate the existence of each other by learning their nonlinear relationships and their close interactions are modeled in a more complex way. It is also shown that message passing is not always helpful but dependent on individual samples. Gated functions are therefore needed to control message transmission, whose on-or-offs are controlled by extra visual evidence from the input sample. The effectiveness of GBD-Net is shown through experiments on three object detection datasets, ImageNet, Pascal VOC2007 and Microsoft COCO. Besides the GBD-Net, this paper also shows the details of our approach in winning the ImageNet object detection challenge of 2016, with source code provided on https://github.com/craftGBD/craftGBD . In this winning system, the modified GBD-Net, new pretraining scheme and better region proposal designs are provided. We also show the effectiveness of different network structures and existing techniques for object detection, such as multi-scale testing, left-right flip, bounding box voting, NMS, and context.

中文翻译:

制作用于目标检测的GBD-Net

来自不同大小和分辨率的多个支撑区域的视觉提示在对象检测中对候选框进行分类时是互补的。来自这些区域的局部和上下文视觉提示的有效整合已成为对象检测中的一个基本问题。在本文中,我们提出了门控双向CNN(GBD-Net),以在特征学习和特征提取过程中在来自不同支持区域的特征之间传递消息。这样的消息传递可以通过在两个方向上的相邻支撑区域之间的卷积来实现,并且可以在各个层中进行。因此,局部和上下文的视觉模式可以通过学习彼此的非线性关系来验证彼此的存在,并且以更复杂的方式对它们的紧密交互进行建模。还表明,消息传递并不总是有用的,而是取决于各个样本。因此,需要门控功能来控制消息的传输,消息的打开或关闭由来自输入样本的额外视觉证据控制。GBD-Net的有效性通过对三个对象检测数据集ImageNet,Pascal VOC2007和Microsoft COCO的实验证明。除了GBD-Net之外,本文还展示了我们在2016年ImageNet对象检测挑战赛中获胜的方法的详细信息,并提供了源代码。Pascal VOC2007和Microsoft COCO。除了GBD-Net之外,本文还展示了我们在2016年ImageNet对象检测挑战赛中获胜的方法的详细信息,并提供了源代码。Pascal VOC2007和Microsoft COCO。除了GBD-Net之外,本文还展示了我们在2016年ImageNet对象检测挑战赛中获胜的方法的详细信息,并提供了源代码。 https://github.com/craftGBD/craftGBD 。在这个成功的系统中,提供了改进的GBD-Net,新的预培训方案和更好的区域建议设计。我们还展示了不同的网络结构和现有的对象检测技术的有效性,例如多尺度测试,左右翻转,边界框投票,NMS和上下文。
更新日期:2018-08-06
down
wechat
bug