当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Compositional Convolutional Neural Networks: A Robust and Interpretable Model for Object Recognition Under Occlusion
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2020-11-24 , DOI: 10.1007/s11263-020-01401-3
Adam Kortylewski , Qing Liu , Angtian Wang , Yihong Sun , Alan Yuille

Computer vision systems in real-world applications need to be robust to partial occlusion while also being explainable. In this work, we show that black-box deep convolutional neural networks (DCNNs) have only limited robustness to partial occlusion. We overcome these limitations by unifying DCNNs with part-based models into Compositional Convolutional Neural Networks (CompositionalNets) - an interpretable deep architecture with innate robustness to partial occlusion. Specifically, we propose to replace the fully connected classification head of DCNNs with a differentiable compositional model that can be trained end-to-end. The structure of the compositional model enables CompositionalNets to decompose images into objects and context, as well as to further decompose object representations in terms of individual parts and the objects' pose. The generative nature of our compositional model enables it to localize occluders and to recognize objects based on their non-occluded parts. We conduct extensive experiments in terms of image classification and object detection on images of artificially occluded objects from the PASCAL3D+ and ImageNet dataset, and real images of partially occluded vehicles from the MS-COCO dataset. Our experiments show that CompositionalNets made from several popular DCNN backbones (VGG-16, ResNet50, ResNext) improve by a large margin over their non-compositional counterparts at classifying and detecting partially occluded objects. Furthermore, they can localize occluders accurately despite being trained with class-level supervision only. Finally, we demonstrate that CompositionalNets provide human interpretable predictions as their individual components can be understood as detecting parts and estimating an objects' viewpoint.

中文翻译:

组合卷积神经网络:一种用于遮挡下物体识别的稳健且可解释的模型

实际应用中的计算机视觉系统需要对部分遮挡具有鲁棒性,同时还需要可解释。在这项工作中,我们展示了黑盒深度卷积神经网络 (DCNN) 对部分遮挡的鲁棒性有限。我们通过将 DCNN 与基于部分的模型统一为组合卷积神经网络 (CompositionalNets) 来克服这些限制——一种可解释的深层架构,对部分遮挡具有先天的鲁棒性。具体来说,我们建议用可以端到端训练的可微组合模型替换 DCNN 的全连接分类头。组合模型的结构使 CompositionalNets 能够将图像分解为对象和上下文,并根据各个部分和对象的姿态进一步分解对象表示。我们的组合模型的生成特性使其能够定位遮挡物并根据对象的非遮挡部分识别对象。我们对来自 PASCAL3D+ 和 ImageNet 数据集的人工遮挡对象的图像以及来自 MS-COCO 数据集的部分遮挡车辆的真实图像进行了大量的图像分类和对象检测实验。我们的实验表明,由几种流行的 DCNN 主干(VGG-16、ResNet50、ResNext)制成的 CompositionalNets 在分类和检测部分遮挡的对象方面比非组合物有很大的改进。此外,尽管仅接受了班级监督的培训,但他们仍可以准确地定位遮挡物。最后,
更新日期:2020-11-24
down
wechat
bug