Manhattan Room Layout Reconstruction from a Single $$360^{\circ }$$ Image: A Comparative Study of State-of-the-Art Methods

Zou, Chuhang; Su, Jheng-Wei; Peng, Chi-Han; Colburn, Alex; Shan, Qi; Wonka, Peter; Chu, Hung-Kuo; Hoiem, Derek

doi:10.1007/s11263-020-01426-8

Manhattan Room Layout Reconstruction from a Single $360^{\circ }$ Image: A Comparative Study of State-of-the-Art Methods

Published: 09 February 2021

Volume 129, pages 1410–1431, (2021)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Chuhang Zou ORCID: orcid.org/0000-0003-2537-284X¹^na1,
Jheng-Wei Su²^na1,
Chi-Han Peng^3,4,
Alex Colburn⁵,
Qi Shan⁶,
Peter Wonka⁷,
Hung-Kuo Chu² &
…
Derek Hoiem¹

1054 Accesses
29 Citations
3 Altmetric
Explore all metrics

Abstract

Recent approaches for predicting layouts from 360$^{\circ }$ panoramas produce excellent results. These approaches build on a common framework consisting of three steps: a pre-processing step based on edge-based alignment, prediction of layout elements, and a post-processing step by fitting a 3D layout to the layout elements. Until now, it has been difficult to compare the methods due to multiple different design decisions, such as the encoding network (e.g., SegNet or ResNet), type of elements predicted (e.g., corners, wall/floor boundaries, or semantic segmentation), or method of fitting the 3D layout. To address this challenge, we summarize and describe the common framework, the variants, and the impact of the design decisions. For a complete evaluation, we also propose extended annotations for the Matterport3D dataset (Chang et al.: Matterport3d: learning from rgb-d data in indoor environments. arXiv:1709.06158, 2017), and introduce two depth-based evaluation metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

General 3D Room Layout from a Single View by Render-and-Compare

3D Room Layout Estimation from a Cubemap of Panorama Image via Deep Manhattan Hough Transform

IndoorNet: Generating Indoor Layouts from a Single Panorama Image

Notes

Code is available at: https://github.com/zouchuhang/LayoutNetv2.
Code is available at: https://github.com/SunDaDenny/DuLa-Net.
Our annotation is available at: https://github.com/ericsujw/Matterport3DLayoutAnnotation.
We revised the SGD based optimization implemented by Sun (with different loss term weights): https://github.com/sunset1995/pytorch-layoutnet.

References

Armeni, I., Sax, A., Zamir, A. R., & Savarese, S. (2017). Joint 2D–3D-semantic data for indoor scene understanding. arXiv:1702.01105.
Cabral, R., & Furukawa, Y. (2014). Piecewise planar and compact floorplan reconstruction from images. In CVPR (pp. 628–635).
Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., & Zhang, Y. (2017). Matterport3d: Learning from rgb-d data in indoor environments. arXiv:1709.06158.
Coughlan, J. M., & Yuille, A. L. (1999). Manhattan world: Compass direction from a single image by Bayesian inference. In ICCV, IEEE (vol. 2, pp. 941–947).
Dasgupta, S., Fang, K., Chen, K., & Savarese, S. (2016). Delay: Robust spatial layout estimation for cluttered indoor scenes. In CVPR (pp. 616–624).
Del Pero, L., Bowdish, J., Fried, D., Kermgard, B., Hartley, E., & Barnard, K. (2012). Bayesian geometric modeling of indoor scenes. In CVPR (pp. 2719–2726).
Del Pero, L., Bowdish, J., Kermgard, B., Hartley, E., & Barnard, K. (2013). Understanding Bayesian rooms using composite 3d object models. In CVPR (pp. 153–160).
Delage, E., Lee, H., & Ng, A. Y. (2006). A dynamic Bayesian network model for autonomous 3d reconstruction from a single indoor image. In CVPR, IEEE (vol. 2, pp. 2418–2428).
Flint, A., Mei, C., Murray, D., & Reid, I. (2010). A dynamic programming approach to reconstructing building interiors. In European conference on computer vision (pp. 394–407). Springer.
Fukano, K., Mochizuki, Y., Iizuka, S., Simo-Serra, E., Sugimoto, A., & Ishikawa, H. (2016). Room reconstruction from a single spherical image by higher-order energy minimization. In 2016 23rd international conference on pattern recognition (ICPR) (pp. 1768–1773).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
Hedau, V., Hoiem, D., & Forsyth, D. (2009). Recovering the spatial layout of cluttered rooms. In ICCV.
Hedau, V., Hoiem, D., & Forsyth, D. (2010). Thinking inside the box: Using appearance models and context based on room geometry. In ECCV (pp. 224–237).
Hoiem, D., Efros, A. A., & Hebert, M. (2005). Geometric context from a single image. In ICCV, IEEE (vol. 1, pp. 654–661).
Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image. International Journal of Computer Vision, 75(1), 151–172.
Article MATH Google Scholar
Izadinia, H., Shan, Q., & Seitz, S. M. (2017). Im2cad. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5134–5143).
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980.
Lee, C. Y., Badrinarayanan, V., Malisiewicz, T., & Rabinovich, A. (2017). Roomnet: End-to-end room layout estimation. In Proceedings of the IEEE international conference on computer vision (pp. 4865–4874).
Lee, D., Gupta, A., Hebert, M., & Kanade, T. (2010). Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In NIPS (pp. 1288–1296).
Lee, D. C., Hebert, M., & Kanade, T. (2009). Geometric reasoning for single image structure recovery. In CVPR (pp. 2136–2143). IEEE.
Liu, C., Kohli, P., & Furukawa, Y. (2016). Layered scene decomposition via the occlusion-crf. In CVPR (pp. 165–173).
Liu, C., Schwing, A. G., Kundu, K., Urtasun, R., & Fidler, S. (2015). Rent3d: Floor-plan priors for monocular layout estimation. In CVPR (pp. 3413–3421).
Liu, C., Wu, J., & Furukawa, Y. (2018). Floornet: A unified framework for floorplan reconstruction from 3d scans. In Proceedings of the European conference on computer vision (ECCV) (pp. 201–217).
Mallya, A., & Lazebnik, S. (2015). Learning informative edge maps for indoor scene layout prediction. In ICCV (pp. 936–944).
Monszpart, A., Mellado, N., Brostow, G. J., & Mitra, N. J. (2015). Rapter: Rebuilding man-made scenes with regular arrangements of planes. ACM Transactions on Graphics, 34(4), 103.
Article Google Scholar
Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A. J., et al. (2011). Kinectfusion: Real-time dense surface mapping and tracking. ISMAR, 11, 127–136.
Google Scholar
Pintore, G., Garro, V., Ganovelli, F., Gobbetti, E., & Agus, M. (2016). Omnidirectional image capture on mobile devices for fast automatic generation of 2.5 d indoor maps. In 2016 IEEE winter conference on applications of computer vision (WACV) (pp. 1–9). IEEE.
Ramalingam, S., & Brand, M. (2013). Lifting 3d manhattan lines from a single image. In Proceedings of the IEEE international conference on computer vision (pp. 497–504).
Ramalingam, S., Pillai, J. K., Jain, A., & Taguchi, Y. (2013). Manhattan junction catalogue for spatial reasoning of indoor scenes. In CVPR (pp. 3065–3072).
Ren, Y., Li, S., Chen, C., & Kuo, C. C. J. (2016). A coarse-to-fine indoor layout estimation (cfile) method. In Asian conference on computer vision (pp. 36–51). Springer.
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22, 400–407.
Article MathSciNet MATH Google Scholar
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International conference on medical image computing and computer-assisted intervention (pp. 234–241). Springer.
Schwing, A. G., Hazan, T., Pollefeys, M., & Urtasun, R. (2012). Efficient structured prediction for 3d indoor scene understanding. In 2012 IEEE conference on computer vision and pattern recognition (pp .2815–2822). IEEE.
Schwing, A. G., & Urtasun, R. (2012). Efficient exact inference for 3d indoor scene understanding. In European conference on computer vision (pp. 299–313). Springer.
Sun, C., Hsiao, C. W., Sun, M., & Chen, H. T. (2019). Horizonnet: Learning room layout with 1d representation and pano stretch data augmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1047–1056).
Von Gioi, R. G., Jakubowicz, J., Morel, J. M., & Randall, G. (2008). Lsd: A fast line segment detector with a false detection control. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(4), 722–732.
Article Google Scholar
Wang, F. E., Yeh, Y. H., Sun, M., Chiu, W. C., & Tsai, Y. H. (2020). Layoutmp3d: Layout annotation of matterport3d. arXiv:2003.13516.
Xu, J., Stenger, B., Kerola, T., & Tung, T. (2017). Pano2cad: Room layout from a single panorama image. In 2017 IEEE winter conference on applications of computer vision (WACV) (pp. 354–362). IEEE.
Yang, H., & Zhang, H. (2016). Efficient 3d room shape recovery from a single panorama. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5422–5430).
Yang, S. T., Peng, C. H., Wonka, P., & Chu, H. K. (2018a). Panoannotator: A semi-automatic tool for indoor panorama layout annotation. In SIGGRAPH Asia 2018 posters (p. 34). ACM.
Yang, S. T., Wang, F. E., Peng, C. H., Wonka, P., Sun, M., & Chu, H. K. (2019). Dula-net: A dual-projection network for estimating room layouts from a single rgb panorama. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3363–3372).
Yang, Y., Jin, S., Liu, R., Bing Kang, S., & Yu, J. (2018b). Automatic 3d indoor scene modeling from single panorama. In The IEEE conference on computer vision and pattern recognition (CVPR).
Zhang, J., Kan, C., Schwing, A. G., & Urtasun, R. (2013). Estimating the 3d layout of indoor scenes and its clutter from depth sensors. In ICCV (pp. 1273–1280).
Zhang, Y., Song, S., Tan, P., & Xiao, J. (2014). Panocontext: A whole-room 3d context model for panoramic scene understanding. In European conference on computer vision (pp. 668–686). Springer.
Zhao, H., Lu, M., Yao, A., Guo, Y., Chen, Y., & Zhang, L. (2017). Physics inspired optimization on semantic transfer features: An alternative method for room layout estimation. In The IEEE conference on computer vision and pattern recognition (CVPR).
Zhao, Y., & Zhu, S. C. (2013). Scene parsing by integrating function, geometry and appearance models. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3119–3126).
Zou, C., Colburn, A., Shan, Q., & Hoiem, D. (2018). Layoutnet: Reconstructing the 3d room layout from a single rgb image. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2051–2059).
Zou, C., Guo, R., Li, Z., & Hoiem, D. (2019). Complete 3d scene parsing from an rgbd image. International Journal of Computer Vision, 127(2), 143–162.
Article Google Scholar

Download references

Acknowledgements

This research is supported in part by ONR MURI Grant N00014-16-1-2007, iStaging Corp. fund and the Ministry of Science and Technology of Taiwan (108-2218-E-007-050- and 107-2221-E-007-088-MY3). We thank Shang-Ta Yang for providing the source code of DuLa-Net. We thank Cheng Sun for providing the source code of HorizonNet and help run experiments on our provided dataset.

Author information

Chuhang Zou and Jheng-Wei Su have contributed equally to this work.

Authors and Affiliations

University of Illinois at Urbana-Champaign, Champaign, USA
Chuhang Zou & Derek Hoiem
National Tsing Hua University, Hsinchu, Taiwan
Jheng-Wei Su & Hung-Kuo Chu
National Chiao Tung University, Hsinchu, Taiwan
Chi-Han Peng
ShanghaiTech University, Shanghai, China
Chi-Han Peng
University of Washington, Seattle, USA
Alex Colburn
Apple Inc., Cupertino, USA
Qi Shan
King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Peter Wonka

Authors

Chuhang Zou
View author publications
You can also search for this author in PubMed Google Scholar
Jheng-Wei Su
View author publications
You can also search for this author in PubMed Google Scholar
Chi-Han Peng
View author publications
You can also search for this author in PubMed Google Scholar
Alex Colburn
View author publications
You can also search for this author in PubMed Google Scholar
Qi Shan
View author publications
You can also search for this author in PubMed Google Scholar
Peter Wonka
View author publications
You can also search for this author in PubMed Google Scholar
Hung-Kuo Chu
View author publications
You can also search for this author in PubMed Google Scholar
Derek Hoiem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chuhang Zou.

Additional information

Communicated by Kristen Grauman.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Qi Shan: This work is not done while at apple.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zou, C., Su, JW., Peng, CH. et al. Manhattan Room Layout Reconstruction from a Single $360^{\circ }$ Image: A Comparative Study of State-of-the-Art Methods. Int J Comput Vis 129, 1410–1431 (2021). https://doi.org/10.1007/s11263-020-01426-8

Download citation

Received: 09 October 2019
Accepted: 23 December 2020
Published: 09 February 2021
Issue Date: May 2021
DOI: https://doi.org/10.1007/s11263-020-01426-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Manhattan Room Layout Reconstruction from a Single \(360^{\circ }\) Image: A Comparative Study of State-of-the-Art Methods

Abstract

Access this article

Similar content being viewed by others

General 3D Room Layout from a Single View by Render-and-Compare

3D Room Layout Estimation from a Cubemap of Panorama Image via Deep Manhattan Hough Transform

IndoorNet: Generating Indoor Layouts from a Single Panorama Image

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Manhattan Room Layout Reconstruction from a Single \(360^{\circ }\) Image: A Comparative Study of State-of-the-Art Methods

Abstract

Access this article

Similar content being viewed by others

General 3D Room Layout from a Single View by Render-and-Compare

3D Room Layout Estimation from a Cubemap of Panorama Image via Deep Manhattan Hough Transform

IndoorNet: Generating Indoor Layouts from a Single Panorama Image

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation