Skip to main content
Log in

Deep Multicameral Decoding for Localizing Unoccluded Object Instances from a Single RGB Image

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Occlusion-aware instance-sensitive segmentation is a complex task generally split into region-based segmentations, by approximating instances as their bounding box. We address the showcase scenario of dense homogeneous layouts in which this approximation does not hold. In this scenario, outlining unoccluded instances by decoding a deep encoder becomes difficult, due to the translation invariance of convolutional layers and the lack of complexity in the decoder. We therefore propose a multicameral design composed of subtask-specific lightweight decoder and encoder–decoder units, coupled in cascade to encourage subtask-specific feature reuse and enforce a learning path within the decoding process. Furthermore, the state-of-the-art datasets for occlusion-aware instance segmentation contain real images with few instances and occlusions mostly due to objects occluding the background, unlike dense object layouts. We thus also introduce a synthetic dataset of dense homogeneous object layouts, namely Mikado, which extensibly contains more instances and inter-instance occlusions per image than these public datasets. Our extensive experiments on Mikado and public datasets show that ordinal multiscale units within the decoding process prove more effective than state-of-the-art design patterns for capturing position-sensitive representations. We also show that Mikado is plausible with respect to real-world problems, in the sense that it enables the learning of performance-enhancing representations transferable to real images, while drastically reducing the need of hand-made annotations for finetuning. The proposed dataset will be made publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. Publicly available at https://mikado.liris.cnrs.fr

  2. https://images.google.com/

References

  • Antoniou, A., Storkey, A. J., & Edwards, H. (2018). Augmenting image classifiers using data augmentation generative adversarial networks. In International conference on artificial neural networks and machine learning (ICANN) (Vol. 11141, pp. 594–603). Lecture notes in computer science, Springer.

  • Ayvaci, A., Raptis, M., & Soatto, S. (2010). Occlusion detection and motion estimation with convex optimization. In Advances in neural information processing systems (NIPS) (pp. 100–108).

  • Ayvaci, A., Raptis, M., & Soatto, S. (2012). Sparse occlusion detection with optical flow. International Journal of Computer Vision (IJCV), 97(3), 322–338.

    Article  MathSciNet  Google Scholar 

  • Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). SegNet: A deep convolutional encoder–decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 39(12), 2481–2495.

    Article  Google Scholar 

  • Bai, M., Urtasun, R. (2017). Deep watershed transform for instance segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 2858–2866). IEEE Computer Society.

  • Batra, A., Singh, S., Pang, G., Basu, S., Jawahar, C., & Paluri, M. (2019). Improved road connectivity by joint learning of orientation and segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 10385–10393). Computer Vision Foundation/IEEE.

  • Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. W. (2010a). A theory of learning from different domains. Machine Learning, 79(1–2), 151–175.

    Article  MathSciNet  Google Scholar 

  • Ben-David, S., Lu, T., Luu, T., Pál, D. (2010b). Impossibility theorems for domain adaptation. In International conference on artificial intelligence and statistics (AISTATS), JMLR.org, JMLR proceedings (Vol. 9, pp. 129–136).

  • Blender Online Community. (2016). Blender—a 3D modelling and rendering package. Blender Foundation, Blender Institute, Amsterdam, http://www.blender.org.

  • Brégier, R., Devernay, F., Leyrit, L., & Crowley, J. L. (2017). Symmetry aware evaluation of 3d object detection and pose estimation in scenes of many parts in bulk. In International conference on computer vision workshops (ICCVW) (pp. 2209–2218). IEEE Computer Society.

  • Caesar, H., Uijlings, J. R. R., Ferrari, V. (2018). COCO-Stuff: Thing and stuff classes in context. In Conference on computer vision and pattern recognition (CVPR) (pp. 1209–1218). IEEE Computer Society.

  • Cai, H., Zhu, L., & Han, S. (2019). ProxylessNAS: Direct neural architecture search on target task and hardware. In International conference on learning representations (ICLR).

  • Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder–decoder with atrous separable convolution for semantic image segmentation. In European conference on computer vision (ECCV) part VII (Vol. 11211, pp. 833–851). Lecture notes in computer science, Springer.

  • Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., & Le, Q. V. (2019). AutoAugment: learning augmentation strategies from data. In Conference on computer vision and pattern recognition (CVPR) (pp. 113–123). Computer Vision Foundation/IEEE.

  • Dai, J., He, K., & Sun, J. (2016). Instance-aware semantic segmentation via multi-task network cascades. In Conference on computer vision and pattern recognition (CVPR) (pp. 3150–3158). IEEE Computer Society.

  • Deng, R., Shen, C., Liu, S., Wang, H., & Liu, X. (2018). Learning to predict crisp boundaries. In European conference on computer vision (ECCV) part VI (Vol. 11210, pp. 570–586). Lecture notes in computer science, Springer.

  • Do, T. T., Nguyen, A., & Reid, I. D. (2018). AffordanceNet: An end-to-end deep learning approach for object affordance detection. In International conference on robotics and automation (ICRA) (pp. 1–5). IEEE.

  • Dong, X., Yan, Y., Ouyang, W., Yang, Y. (2018). Style aggregated network for facial landmark detection. In Conference on computer vision and pattern recognition (CVPR) (pp. 379–388). IEEE Computer Society.

  • Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In Advances in neural information processing systems (NIPS) (pp. 2366–2374).

  • Everingham, M., Eslami, S. M., Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision (IJCV), 111(1), 98–136.

    Article  Google Scholar 

  • Fan, R., Cheng, M. M., Hou, Q., Mu, T. J., Wang, J., & Hu, S. M. (2019). S4Net: Single stage salient-instance segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 6103–6112). Computer Vision Foundation/IEEE.

  • Follmann, P., Böttger, T., Härtinger, P., König, R., & Ulrich, M. (2018). MVTec D2S: Densely segmented supermarket dataset. In European conference on computer vision (ECCV) part X (Vol. 11214, pp. 581–597). Lecture notes in computer science, Springer.

  • Follmann, P., König, R., Härtinger, P., Klostermann, M., & Böttger, T. (2019). Learning to see the invisible: End-to-end trainable amodal instance segmentation. In Winter conference on applications of computer vision, (WACV) (pp. 1328–1336). IEEE.

  • Fu, H., Gong, M., Wang, C., Batmanghelich, K., & Tao, D. (2018). Deep ordinal regression network for monocular depth estimation. In Conference on computer vision and pattern recognition (CVPR) (pp. 2002–2011). IEEE Computer Society.

  • Fu, H., Wang, C., Tao, D., & Black, M. J. (2016). Occlusion boundary detection via deep exploration of context. In Conference on computer vision and pattern recognition (CVPR) (pp. 241–250). IEEE Computer Society.

  • Gaidon, A., Wang, Q., Cabon, Y., & Vig, E. (2016). Virtual worlds as proxy for multi-object tracking analysis. In Conference on computer vision and pattern recognition (CVPR), IEEE Computer Society.

  • Gan, Y., Xu, X., Sun, W., & Lin, L. (2018). Monocular depth estimation with affinity, vertical pooling, and label enhancement. In European conference on computer vision (ECCV) part III (Vol. 11207, pp. 232–247). Lecture notes in computer science, Springer.

  • Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The KITTI dataset. International Journal of Robotics Research (IJRR), 32(11), 1231–1237.

    Article  Google Scholar 

  • Geiger, D., Ladendorf, B., & Yuille, A. L. (1995). Occlusions and binocular stereo. International Journal of Computer Vision (IJCV), 14(3), 211–226.

    Article  Google Scholar 

  • Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In International conference on artificial intelligence and statistics (AISTATS), JMLR.org, JMLR proceedings (Vol. 9, pp. 249–256)

  • Grammalidis, N., & Strintzis, M. G. (1998). Disparity and occlusion estimation in multiocular systems and their coding for the communication of multiview image sequences. Transactions on Circuits and Systems for Video Technology (TCSVT), 8(3), 328–344.

    Article  Google Scholar 

  • Grard, M., Brégier, R., Sella, F., Dellandréa, E., & Chen, L. (2018). Object segmentation in depth maps with one user click and a synthetically trained fully convolutional network. In 2017 international workshop on human-friendly robotics (Vol. 7, pp. 207–221). Springer proceedings in advanced robotics, Springer.

  • Guan, S., Khan, A. A., Sikdar, S., Chitnis, P. V. (2018). Fully dense UNet for 2D sparse photoacoustic tomography artifact removal. Journal of Biomedical and Health Informatics.

  • Hayder, Z., He, X., & Salzmann, M. (2017). Boundary-aware instance segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 587–595). IEEE Computer Society.

  • He, K., Gkioxari, G., Dollár, P., & Girshick, R. B. (2017). Mask R-CNN. In International conference on computer vision (ICCV) (pp. 2980–2988). IEEE Computer Society.

  • He, X., & Yuille, A. (2010). Occlusion boundary detection using pseudo-depth. In European conference on computer vision (ECCV) part IV (Vol. 6314, pp. 539–552). Lecture notes in computer science, Springer.

  • Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Conference on computer vision and pattern recognition (CVPR) (pp. 2261–2269). IEEE Computer Society.

  • Humayun, A., Mac Aodha, O., Brostow, G. J. (2011). Learning to find occlusion regions. In Conference on computer vision and pattern recognition (CVPR) (pp. 2161–2168). IEEE Computer Society.

  • Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In International conference on multimedia (pp. 675–678). ACM, MM’14.

  • Kendall, A., Gal, Y., & Cipolla, R. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Conference on computer vision and pattern recognition (CVPR) (pp. 7482–7491). IEEE Computer Society.

  • Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In International conference on learning representations (ICLR).

  • Kirillov, A., Levinkov, E., Andres, B., Savchynskyy, B., & Rother, C. (2017). InstanceCut: From edges to instances with multicut. In Conference on computer vision and pattern recognition (CVPR) (pp. 7322–7331). IEEE Computer Society.

  • Kirillov, A., Wu, Y., He, K., & Girshick, R. B. (2019). PointRend: Image segmentation as rendering. CoRR, arXiv:1912.08193, http://arxiv.org/abs/1912.08193

  • Kong, S., & Fowlkes, C. C. (2018). Recurrent pixel embedding for instance grouping. In Conference on computer vision and pattern recognition (CVPR) (pp. 9018–9028). IEEE Computer Society.

  • Lee, W., Na, J., & Kim, G. (2019). Multi-task self-supervised object detection via recycling of bounding box annotations. In Conference on computer vision and pattern recognition (CVPR) (pp. 4984–4993). Computer Vision Foundation/IEEE.

  • Li, B., Shen, C., Dai, Y., van den Hengel, A., & He, M. (2015). Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In Conference on computer vision and pattern recognition (CVPR) (pp. 1119–1127). IEEE Computer Society.

  • Li, G., Xie, Y., Lin, L., & Yu, Y. (2017). Instance-level salient object segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 247–256). IEEE Computer Society.

  • Lin, T. Y., Goyal, P., Girshick, R. B., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In International conference on computer vision (ICCV) (pp. 2999–3007). IEEE Computer Society.

  • Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In European conference on computer vision (ECCV) Part V (Vol. 8693, pp. 740–755). Lecture notes in computer science, Springer.

  • Liu, F., Shen, C., Lin, G., & Reid, I. D. (2016). Learning depth from single monocular images using deep convolutional neural fields. IEEE Transactions on Pattern Analysis Machine Intelligence (TPAMI), 38(10), 2024–2039.

    Article  Google Scholar 

  • Liu, G., Si, J., Hu, Y., & Li, S. (2018a). Photographic image synthesis with improved U-net. In International conference on advanced computational intelligence (ICACI) (pp. 402–407). IEEE.

  • Liu, R., Lehman, J., Molino, P., Such, F. P., Frank, E., Sergeev, A., & Yosinski, J. (2018b). An intriguing failing of convolutional neural networks and the coordconv solution. In Advances in neural information processing systems (NeurIPS) (pp. 9628–9639).

  • Liu, S., Johns, E., & Davison, A. J. (2019). End-to-end multi-task learning with attention. In Conference on computer vision and pattern recognition (CVPR) (pp. 1871–1880). Computer Vision Foundation/IEEE.

  • Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. (2018c). Path aggregation network for instance segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 8759–8768). IEEE Computer Society.

  • Liu, Y., Cheng, M. M., Hu, X., Wang, K., & Bai, X. (2017). Richer convolutional features for edge detection. In Conference on computer vision and pattern recognition (CVPR) (pp. 5872—5881). IEEE Computer Society.

  • Luo, P., Wang, G., Lin, L., & Wang, X. (2017). Deep dual learning for semantic image segmentation. In International conference on computer vision (ICCV) (pp. 2737–2745). IEEE Computer Society.

  • Maninis, K. K., Pont-Tuset, J., Arbeláez, P. A., & Gool, L. J. V. (2016). Convolutional oriented boundaries. In European conference on computer vision (ECCV) part I (Vol. 9905, pp. 580–596). Lecture notes in computer science, Springer.

  • Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In International conference on computer vision (ICCV) (pp. 416–423). IEEE Computer Society.

  • McCormac, J., Handa, A., Leutenegger, S., & Davison, A. J. (2017). SceneNet RGB-D: Can 5M synthetic images beat generic imagenet pre-training on indoor segmentation? In International conference on computer vision (ICCV) (pp. 2697–2706). IEEE Computer Society.

  • Misra, I., Shrivastava, A., Gupta, A., & Hebert, M. (2016). Cross-stitch networks for multi-task learning. In Conference on computer vision and pattern recognition (CVPR) (pp. 3994–4003). IEEE Computer Society.

  • Novotný, D., Albanie, S., Larlus, D., & Vedaldi, A. (2018). Semi-convolutional operators for instance segmentation. In European conference on computer vision (ECCV) part I (Vol. 11205, pp. 89–105). Lecture notes in computer science, Springer.

  • Pont-Tuset, J., Arbelaez, P., Barron, J. T., Marqués, F., & Malik, J. (2017). Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 39(1), 128–140.

    Article  Google Scholar 

  • Qi, L., Jiang, L., Liu, S., Shen, X., & Jia, J. (2019). Amodal instance segmentation with KINS dataset. In Conference on computer vision and pattern recognition (CVPR) (pp. 3014–3023). Computer Vision Foundation/IEEE.

  • Ren, M., & Zemel, R. S. (2017). End-to-end instance segmentation with recurrent attention. In Conference on computer vision and pattern recognition (CVPR) (pp. 293–301). IEEE Computer Society.

  • Ren, X., Fowlkes, C. C., Malik, J. (2006). Figure/ground assignment in natural images. In European conference on computer vision (ECCV) part II (Vol. 3952, pp. 614–627). Lecture notes in computer science, Springer.

  • Romera-Paredes, B., & Torr, P. H. S. (2016). Recurrent instance segmentation. In European conference on computer vision (ECCV) part VI (Vol. 9910, pp. 312–329). Lecture notes in computer science, Springer.

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. Lecture notes in computer science (pp. 234–241). Springer.

  • Ros, G., Sellart, L., Materzynska, J., Vázquez, D., & López, A. M. (2016). The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Conference on computer vision and pattern recognition (CVPR) (pp. 3234–3243). IEEE Computer Society.

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3), 211–252.

    Article  MathSciNet  Google Scholar 

  • Shi, W., Caballero, J., Huszar, F., Totz, J., Aitken, A. P., Bishop, R., Rueckert, D., & Wang, Z. (2016). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Conference on computer vision and pattern recognition (CVPR) (pp. 1874–1883). IEEE Computer Society.

  • Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations (ICLR), IEEE Computer Society.

  • Stein, A., & Hebert, M. (2006). Local detection of occlusion boundaries in video. In British machine vision conference (BMVC).

  • Sun, D., Liu, C., & Pfister, H. (2014). Local layering for joint motion estimation and occlusion detection. In Conference on computer vision and pattern recognition (CVPR) (pp. 1098–1105). IEEE Computer Society.

  • Tang, Z., Peng, X., Geng, S., Wu, L., Zhang, S., & Metaxas, D. N. (2018). Quantized densely connected U-Nets for efficient landmark localization. In European conference on computer vision (ECCV) part III (Vol. 11207, pp. 348–364). Lecture notes in computer science, Springer.

  • Wang, G., Wang, X., Li, F. W. B., & Liang, X. (2018a). DOOBNet: Deep object occlusion boundary detection from an image. In Asian conference on computer vision (ACCV) part VI (Vol. 11366, pp. 686–702). Lecture notes in computer science, Springer.

  • Wang, P., & Yuille, A. L. (2016). DOC: Deep occlusion estimation from a single image. In European conference on computer vision (ECCV) part I (Vol. 9905, pp. 545–561). Lecture notes in computer science, Springer.

  • Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., & Cottrell, G. W. (2018b). Understanding convolution for semantic segmentation. In Winter conference on applications of computer vision (WACV) (pp. 1451–1460).

  • Wang, Y., Zhao, X., & Huang, K. (2017). Deep crisp boundaries. In Conference on computer vision and pattern recognition (CVPR) (pp. 1724–1732). IEEE Computer Society.

  • Williams, O., Isard, M., & MacCormick., J. (2011). Estimating disparity and occlusions in stereo video sequences. In Conference on computer vision and pattern recognition (CVPR) (pp. 250–257). IEEE Computer Society.

  • Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In International conference on computer vision (ICCV) (pp. 1395–1403). IEEE Computer Society.

  • Yang, J., Price, B. L., Cohen, S., Lee, H., & Yang, M. H. (2016). Object contour detection with a fully convolutional encoder–decoder network. In Conference on computer vision and pattern recognition (CVPR)

  • Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? In Advances in neural information processing systems (NIPS) (pp. 3320–3328).

  • Yu, F., & Koltun, V. (2016). Multi-scale context aggregation by dilated convolutions. In International conference on learning representations (ICLR).

  • Yu, J., Yang, L., Xu, N., Yang, J., & Huang, T. (2019). Slimmable neural networks. In International conference on learning representations (ICLR).

  • Yu, Z., Liu, W., Zou, Y., Feng, C., Ramalingam, S., Kumar, B. V. K. V., & Kautz, J. (2018). Simultaneous edge alignment and learning. In European conference on computer vision (ECCV) part III (Vol. 11207, pp. 400–417). Lecture notes in computer science, Springer.

  • Zhang, L., Li, X., Arnab, A., Yang, K., Tong, Y., & Torr, P. H. (2019). Dual graph convolutional network for semantic segmentation. In British machine vision conference (BMVC).

  • Zhu, Y., Tian, Y., Metaxas, D. N., Dollár, P. (2017). Semantic amodal segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 3001–3009). IEEE Computer Society.

  • Zitnick, C. L., & Kanade, T. (2000). A cooperative algorithm for stereo matching and occlusion detection. IEEE Transactions on Pattern Analysis Machine Intelligence (TPAMI), 22(7), 675–684.

    Article  Google Scholar 

Download references

Acknowledgements

We thank Romain Brégier, Florian Sella, and the anonymous reviewers for their insightful comments and suggestions that helped us to greatly improve this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthieu Grard.

Additional information

Communicated by Anelia Angelova, Gustavo Carneiro, Niko Sünderhauf, Jürgen Leitner.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 21102 KB)

Appendix

Appendix

In this section, we provide additional comparisons, results and materials (Figs. 15, 16, 17, 18, 19 and 20) related to Sects. 4 and 5.

Fig. 15
figure 15

Comparative results for instance boundary (blue) and unoccluded boundary side (orange) detection on COCOA. From top to bottom: input (a), ground truth (b), inference by amodal instance segmentation (Zhu et al. 2017) (c), using a bicameral structure (d). Unlike the proposed approach, using a region proposal-based detection qualitatively leads to coarse segmentations and non-detected instances. Best viewed in color (Color figure online)

Fig. 16
figure 16

Supplementary material on the proposed synthetic data generation pipeline

Fig. 17
figure 17

Comparative results for occlusion-aware boundary detection on PIOD and Mikado. Best viewed in color (Color figure online)

Fig. 18
figure 18

Comparative results for occlusion-aware boundary detection on PIOD and Mikado, using a bicameral structure: with and without skip connections, with different types of skip connections, with different encoder backbones. Best viewed in color (Color figure online)

Fig. 19
figure 19

Training (solid) and test (dashed) errors for instance boundary (top) and occluding boundary side (bottom) detection on PIOD (left) and Mikado (right) using different network architectures. Lower boundary and occlusion errors are reached when jointly learning boundaries and occlusions (green, blue, yellow, purple) rather than independently (red). Best viewed in color (Color figure online)

Fig. 20
figure 20

Comparative performances of a bicameral structure on D2SA using different pretraining conditions. Performances on both boundaries and occlusions are maximized when freezing at finetuning time the first three encoder blocks pretrained on Mikado. Best viewed in color (Color figure online)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Grard, M., Dellandréa, E. & Chen, L. Deep Multicameral Decoding for Localizing Unoccluded Object Instances from a Single RGB Image. Int J Comput Vis 128, 1331–1359 (2020). https://doi.org/10.1007/s11263-020-01323-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-020-01323-0

Keywords

Navigation