Skip to main content
Log in

Manhattan Room Layout Reconstruction from a Single \(360^{\circ }\) Image: A Comparative Study of State-of-the-Art Methods

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Recent approaches for predicting layouts from 360\(^{\circ }\) panoramas produce excellent results. These approaches build on a common framework consisting of three steps: a pre-processing step based on edge-based alignment, prediction of layout elements, and a post-processing step by fitting a 3D layout to the layout elements. Until now, it has been difficult to compare the methods due to multiple different design decisions, such as the encoding network (e.g., SegNet or ResNet), type of elements predicted (e.g., corners, wall/floor boundaries, or semantic segmentation), or method of fitting the 3D layout. To address this challenge, we summarize and describe the common framework, the variants, and the impact of the design decisions. For a complete evaluation, we also propose extended annotations for the Matterport3D dataset (Chang et al.: Matterport3d: learning from rgb-d data in indoor environments. arXiv:1709.06158, 2017), and introduce two depth-based evaluation metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. Code is available at: https://github.com/zouchuhang/LayoutNetv2.

  2. Code is available at: https://github.com/SunDaDenny/DuLa-Net.

  3. Our annotation is available at: https://github.com/ericsujw/Matterport3DLayoutAnnotation.

  4. We revised the SGD based optimization implemented by Sun (with different loss term weights): https://github.com/sunset1995/pytorch-layoutnet.

References

  • Armeni, I., Sax, A., Zamir, A. R., & Savarese, S. (2017). Joint 2D–3D-semantic data for indoor scene understanding. arXiv:1702.01105.

  • Cabral, R., & Furukawa, Y. (2014). Piecewise planar and compact floorplan reconstruction from images. In CVPR (pp. 628–635).

  • Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., & Zhang, Y. (2017). Matterport3d: Learning from rgb-d data in indoor environments. arXiv:1709.06158.

  • Coughlan, J. M., & Yuille, A. L. (1999). Manhattan world: Compass direction from a single image by Bayesian inference. In ICCV, IEEE (vol. 2, pp. 941–947).

  • Dasgupta, S., Fang, K., Chen, K., & Savarese, S. (2016). Delay: Robust spatial layout estimation for cluttered indoor scenes. In CVPR (pp. 616–624).

  • Del Pero, L., Bowdish, J., Fried, D., Kermgard, B., Hartley, E., & Barnard, K. (2012). Bayesian geometric modeling of indoor scenes. In CVPR (pp. 2719–2726).

  • Del Pero, L., Bowdish, J., Kermgard, B., Hartley, E., & Barnard, K. (2013). Understanding Bayesian rooms using composite 3d object models. In CVPR (pp. 153–160).

  • Delage, E., Lee, H., & Ng, A. Y. (2006). A dynamic Bayesian network model for autonomous 3d reconstruction from a single indoor image. In CVPR, IEEE (vol. 2, pp. 2418–2428).

  • Flint, A., Mei, C., Murray, D., & Reid, I. (2010). A dynamic programming approach to reconstructing building interiors. In European conference on computer vision (pp. 394–407). Springer.

  • Fukano, K., Mochizuki, Y., Iizuka, S., Simo-Serra, E., Sugimoto, A., & Ishikawa, H. (2016). Room reconstruction from a single spherical image by higher-order energy minimization. In 2016 23rd international conference on pattern recognition (ICPR) (pp. 1768–1773).

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).

  • Hedau, V., Hoiem, D., & Forsyth, D. (2009). Recovering the spatial layout of cluttered rooms. In ICCV.

  • Hedau, V., Hoiem, D., & Forsyth, D. (2010). Thinking inside the box: Using appearance models and context based on room geometry. In ECCV (pp. 224–237).

  • Hoiem, D., Efros, A. A., & Hebert, M. (2005). Geometric context from a single image. In ICCV, IEEE (vol. 1, pp. 654–661).

  • Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image. International Journal of Computer Vision, 75(1), 151–172.

    Article  MATH  Google Scholar 

  • Izadinia, H., Shan, Q., & Seitz, S. M. (2017). Im2cad. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5134–5143).

  • Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980.

  • Lee, C. Y., Badrinarayanan, V., Malisiewicz, T., & Rabinovich, A. (2017). Roomnet: End-to-end room layout estimation. In Proceedings of the IEEE international conference on computer vision (pp. 4865–4874).

  • Lee, D., Gupta, A., Hebert, M., & Kanade, T. (2010). Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In NIPS (pp. 1288–1296).

  • Lee, D. C., Hebert, M., & Kanade, T. (2009). Geometric reasoning for single image structure recovery. In CVPR (pp. 2136–2143). IEEE.

  • Liu, C., Kohli, P., & Furukawa, Y. (2016). Layered scene decomposition via the occlusion-crf. In CVPR (pp. 165–173).

  • Liu, C., Schwing, A. G., Kundu, K., Urtasun, R., & Fidler, S. (2015). Rent3d: Floor-plan priors for monocular layout estimation. In CVPR (pp. 3413–3421).

  • Liu, C., Wu, J., & Furukawa, Y. (2018). Floornet: A unified framework for floorplan reconstruction from 3d scans. In Proceedings of the European conference on computer vision (ECCV) (pp. 201–217).

  • Mallya, A., & Lazebnik, S. (2015). Learning informative edge maps for indoor scene layout prediction. In ICCV (pp. 936–944).

  • Monszpart, A., Mellado, N., Brostow, G. J., & Mitra, N. J. (2015). Rapter: Rebuilding man-made scenes with regular arrangements of planes. ACM Transactions on Graphics, 34(4), 103.

    Article  Google Scholar 

  • Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A. J., et al. (2011). Kinectfusion: Real-time dense surface mapping and tracking. ISMAR, 11, 127–136.

    Google Scholar 

  • Pintore, G., Garro, V., Ganovelli, F., Gobbetti, E., & Agus, M. (2016). Omnidirectional image capture on mobile devices for fast automatic generation of 2.5 d indoor maps. In 2016 IEEE winter conference on applications of computer vision (WACV) (pp. 1–9). IEEE.

  • Ramalingam, S., & Brand, M. (2013). Lifting 3d manhattan lines from a single image. In Proceedings of the IEEE international conference on computer vision (pp. 497–504).

  • Ramalingam, S., Pillai, J. K., Jain, A., & Taguchi, Y. (2013). Manhattan junction catalogue for spatial reasoning of indoor scenes. In CVPR (pp. 3065–3072).

  • Ren, Y., Li, S., Chen, C., & Kuo, C. C. J. (2016). A coarse-to-fine indoor layout estimation (cfile) method. In Asian conference on computer vision (pp. 36–51). Springer.

  • Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22, 400–407.

    Article  MathSciNet  MATH  Google Scholar 

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International conference on medical image computing and computer-assisted intervention (pp. 234–241). Springer.

  • Schwing, A. G., Hazan, T., Pollefeys, M., & Urtasun, R. (2012). Efficient structured prediction for 3d indoor scene understanding. In 2012 IEEE conference on computer vision and pattern recognition (pp .2815–2822). IEEE.

  • Schwing, A. G., & Urtasun, R. (2012). Efficient exact inference for 3d indoor scene understanding. In European conference on computer vision (pp. 299–313). Springer.

  • Sun, C., Hsiao, C. W., Sun, M., & Chen, H. T. (2019). Horizonnet: Learning room layout with 1d representation and pano stretch data augmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1047–1056).

  • Von Gioi, R. G., Jakubowicz, J., Morel, J. M., & Randall, G. (2008). Lsd: A fast line segment detector with a false detection control. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(4), 722–732.

    Article  Google Scholar 

  • Wang, F. E., Yeh, Y. H., Sun, M., Chiu, W. C., & Tsai, Y. H. (2020). Layoutmp3d: Layout annotation of matterport3d. arXiv:2003.13516.

  • Xu, J., Stenger, B., Kerola, T., & Tung, T. (2017). Pano2cad: Room layout from a single panorama image. In 2017 IEEE winter conference on applications of computer vision (WACV) (pp. 354–362). IEEE.

  • Yang, H., & Zhang, H. (2016). Efficient 3d room shape recovery from a single panorama. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5422–5430).

  • Yang, S. T., Peng, C. H., Wonka, P., & Chu, H. K. (2018a). Panoannotator: A semi-automatic tool for indoor panorama layout annotation. In SIGGRAPH Asia 2018 posters (p. 34). ACM.

  • Yang, S. T., Wang, F. E., Peng, C. H., Wonka, P., Sun, M., & Chu, H. K. (2019). Dula-net: A dual-projection network for estimating room layouts from a single rgb panorama. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3363–3372).

  • Yang, Y., Jin, S., Liu, R., Bing Kang, S., & Yu, J. (2018b). Automatic 3d indoor scene modeling from single panorama. In The IEEE conference on computer vision and pattern recognition (CVPR).

  • Zhang, J., Kan, C., Schwing, A. G., & Urtasun, R. (2013). Estimating the 3d layout of indoor scenes and its clutter from depth sensors. In ICCV (pp. 1273–1280).

  • Zhang, Y., Song, S., Tan, P., & Xiao, J. (2014). Panocontext: A whole-room 3d context model for panoramic scene understanding. In European conference on computer vision (pp. 668–686). Springer.

  • Zhao, H., Lu, M., Yao, A., Guo, Y., Chen, Y., & Zhang, L. (2017). Physics inspired optimization on semantic transfer features: An alternative method for room layout estimation. In The IEEE conference on computer vision and pattern recognition (CVPR).

  • Zhao, Y., & Zhu, S. C. (2013). Scene parsing by integrating function, geometry and appearance models. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3119–3126).

  • Zou, C., Colburn, A., Shan, Q., & Hoiem, D. (2018). Layoutnet: Reconstructing the 3d room layout from a single rgb image. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2051–2059).

  • Zou, C., Guo, R., Li, Z., & Hoiem, D. (2019). Complete 3d scene parsing from an rgbd image. International Journal of Computer Vision, 127(2), 143–162.

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported in part by ONR MURI Grant N00014-16-1-2007, iStaging Corp. fund and the Ministry of Science and Technology of Taiwan (108-2218-E-007-050- and 107-2221-E-007-088-MY3). We thank Shang-Ta Yang for providing the source code of DuLa-Net. We thank Cheng Sun for providing the source code of HorizonNet and help run experiments on our provided dataset.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chuhang Zou.

Additional information

Communicated by Kristen Grauman.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Qi Shan: This work is not done while at apple.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zou, C., Su, JW., Peng, CH. et al. Manhattan Room Layout Reconstruction from a Single \(360^{\circ }\) Image: A Comparative Study of State-of-the-Art Methods. Int J Comput Vis 129, 1410–1431 (2021). https://doi.org/10.1007/s11263-020-01426-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-020-01426-8

Keywords

Navigation