An anchor-based graph method for detecting and classifying indoor objects from cluttered 3D point clouds

https://doi.org/10.1016/j.isprsjprs.2020.12.007Get rights and content

Abstract

Most of the existing 3D indoor object classification methods have shown impressive achievements on the assumption that all objects are oriented in the upward direction with respect to the ground. To release this assumption, great effort has been made to handle arbitrarily oriented objects in terrestrial laser scanning (TLS) point clouds. As one of the most promising solutions, anchor-based graphs can be used to classify freely oriented objects. However, this approach suffers from missing anchor detection since valid detection relies heavily on the completeness of an anchor’s point clouds and is sensitive to missing data. This paper presents an anchor-based graph method to detect and classify arbitrarily oriented indoor objects. The anchors of each object are extracted by the structurally adjacent relationship among parts instead of the parts’ geometric metrics. In the case of adjacency, an anchor can be correctly extracted even with missing parts since the adjacency between an anchor and other parts is retained irrespective of the area extent of the considered parts. The best graph matching is achieved by finding the optimal corresponding node-pairs in a super-graph with fully connecting nodes based on maximum likelihood. The performances of the proposed method are evaluated with three indicators (object precision, object recall and object F1-score) in seven datasets. The experimental tests demonstrate the effectiveness of dealing with TLS point clouds, RGBD point clouds and Panorama RGBD point clouds, resulting in performance scores of approximately 0.8 for object precision and recall and over 0.9 for chair precision and table recall.

Introduction

Fast and stable detection and classification of indoor objects from scanned point clouds have been instrumental in many applications, such as autonomous vehicles (Mattausch et al., 2014, Meyer et al., 2019, Naseer et al., 2018), indoor reconstruction (Kang et al., 2020, Li et al., 2018a, Sharif et al., 2017, Wang et al., 2017), robotics (Breuer et al., 2011, Li et al., 2020) and city planning (Rui et al., 2018, Vosselman et al., 2017, Yousefhussien et al., 2018). Moreover, recent advances in scanning technology greatly accelerate data acquisition (Gupta et al., 2015, Rui et al., 2018, Yulan et al., 2014) and improve the accuracy of the scanned point cloud (Mattausch et al., 2014, Ochmann et al., 2019, Wang et al., 2017, Zolanvari et al., 2018). These achievements have contributed to the flourishing of the study and development of 3D indoor object detection and classification for point clouds.

Many methods (Czerniawski et al., 2018, Günther et al., 2017, Mattausch et al., 2014, Nan et al., 2012, Valero et al., 2016, Li et al., 2018b, Verdoja et al., 2017) have been presented for the detection of tables, chairs and bookcases as indoor objects from scanned point clouds by various geometry-based means in cluttered indoor scenes (Mattausch et al., 2014, Wang et al., 2017). Despite the progress made by those methods, one inherent defect is that their successes and availability depend on the assumption that all indoor objects are oriented in an upward direction with respect to the ground or, more restrictively, perpendicular to the ground (Czerniawski et al., 2018, Günther et al., 2017, Mattausch et al., 2014, Nan et al., 2012, Tchapmi et al., 2017, Valero et al., 2016, Wang et al., 2017, Li et al., 2018b, Qi et al., 2017, Verdoja et al., 2017). For a complicated indoor scene or environment with non-upward directions in clutter and occlusion, those methods are incapable of fulfilling the tasks of detection and classification of indoor objects with oblique or even curved surfaces that violate the underlined assumption.

In addition to the abovementioned semantic methodology, substantial attention has recently been paid to the methodology of deep learning. As reviewed by Guo et al. (2020), the deep learning methodology, including multi-view-based (Tatarchenko et al., 2018, Su et al., 2015), voxel-based (Choy et al., 2019, Tchapmi et al., 2017) and point-based (Liang et al., 2019, Qi et al., 2020, Jiang et al., 2019, Li et al., 2018b, Qi et al., 2016, Qi et al., 2017, Qi et al., 2019), provides a very good mechanism with a great potential for semantically detecting and classifying indoor objects. However, its effectiveness comes from its fullness of training data. Only rich training data can produce satisfactory segmentations and classifications. For scenes with arbitrarily oriented objects, it is not easy to acquire all the necessary training data since objects with arbitrary orientations present a wide variety of discriminative features to be captured by this learning mechanism, where the varying orientation or pose of an object may generate a vastly different range of features. For these kinds of scenes, the semantic methodology will show its merit where semantically structures can be embedded in the process of detecting and classifying indoor objects with arbitrary orientations. Therefore, finding some structural and geometric features that are independent from the variation of object orientations warrants further exploration.

The graph-based semantic methodology has been proposed to overcome this notable limitation in the field of indoor modelling from point clouds (Shi et al., 2015, Spina, 2015, Wang et al., 2016). To represent indoor objects precisely and concisely, a graph approach based on functional parts (referring to anchors) (Spina, 2015, Wang et al., 2016) is proposed to capture indoor objects via a priori segmented patches, which describes one object as a graph formed by connecting anchors with other parts. As observed by (Fu et al., 2008, Laga et al., 2013, Nan et al., 2012, Wang et al., 2016), there is a strong correlation between the geometric shape and upward orientation between anchors in man-made objects. In light of such observations, the local coordinate system (LCS) in each graph can be refined by the anchor’s normal vector and position, which shows its reliability in classifying objects arbitrarily oriented in TLS point clouds. As the availability of those methods depends heavily on the extracted anchors, a prominent flaw is to extract anchors as planar primitives by the definition largely relying on their geometrically metric area. For example, a chair with two anchors (the seat and the back) in Fig. 1a may appear in the form of anchors with non-planar primitives as shown in Fig. 1b and anchors that are missing some parts as shown in Fig. 1c; both cases fail to extract the anchors. Furthermore, those methods assume that all legs of indoor chairs or tables can be segmented as cylinders, which is also sensitive to missing parts. As missing some parts of an indoor object in the scanned point clouds is the usual case and cannot be avoided, especially in the RGBD dataset (Dong et al., 2018), finding a method that can both accommodate non-upward directions and extract accurate anchors remains a critical issue in the field of detection and classification of indoor objects.

This paper presents an anchor-based graph method to detect and classify arbitrarily oriented indoor objects. The contribution of this study lies in the fact that anchors are extracted by the structurally adjacent relationship among parts in each object instead of from the parts’ geometric metric area and that the anchor-based graphs are matched by globally optimizing all pairing subgraphs in a super-graph with fully connecting nodes. The optimal matching based on the maximum likelihood outperforms the approximations achieved by many previous techniques (Armeni et al., 2016, Liang et al., 2019, Spina, 2015). By the adjacent relationship, an anchor can be correctly extracted even with missing parts since the adjacency between an anchor and other parts is retained irrespective of the area extent of the considered parts. The graph’s edges are reconstructed by connecting an anchor with its adjacent parts, which ensures the conciseness of the reconstructed graph.

The remainder of this paper is organized as follows. Related works are presented in Section 2. The proposed method of indoor object classification is described in Section 3, followed by experiments conducted with real-world datasets in Section 4, and an evaluation is provided in Section 5. Section 6 outlines the conclusions drawn from the previous discussions.

Section snippets

Related works

The current methods for object detection and classification can be classified into two groups: semantic methodology and deep learning methodology. The semantic methodology classifies indoor objects via pre-defined features of indoor objects, while the deep learning methodology implements object segmentation and classification by discriminative features learnt from the pre-labelled training samples.

Overview

Given a raw point set of an indoor scene, object detection and classification can be defined as segmenting and classifying indoor objects. The entire processing consists of five main steps: patch segmentation, graph reconstruction, approximate clustering via the anchor’s geometric shape, graph clustering via a super-graph and object refinement, as depicted in Fig. 2.

The input point clouds are first partitioned into a collection of patches using the efficient random sample consensus (RANSAC)

Experimental setup

The implementation details of the experiments, including the specification of benchmark datasets, evaluation criteria and parameter settings for our method, are described in this section. The algorithm was implemented by Point Cloud Library (PCL), CloudCompare and MATLAB. All experiments were performed on a 3.60 GHz Intel Core i7-4790 processor with 12 GB of RAM.

Performance comparison

To compare the performance of our proposed method with that of other state-of-the-art approaches, we examined two benchmark datasets (termed as “Bench I” in Fig. 9 and “Bench II” in Fig. 10) which were tested by other methods such as Nan et al., 2012, Mattausch et al., 2014, Wang et al., 2016 and the deep learning method, such as PointCNN (Li et al., 2018b) and VoteNet (Qi et al, 2019). The methods in Nan et al., 2012, Mattausch et al., 2014 were geometry-based methods, while the method in Wang

Conclusion

Current methods for the classification of 3D indoor objects from point clouds rely on the attributes extracted along the upright orientation and have shown obvious defects in terms of classifying objects with various poses without carefully acquired training data. To eliminate such deficiencies, this paper presented an anchor-based graph method capable of handling arbitrarily oriented objects and evaluated its performances on seven popular benchmark datasets. Comprehensive experiments

Funding

This study is funded by the National Natural Science Foundation of China (41871298, 42071366) and the National Key R&D Program of China (2017YFB0503701).

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors would like to gratefully acknowledge Dr. Iro Armeni, Dr. Claudio Mura, Dr. Nan Liangliang, Dr. Axel Wendt, Dr. Rares Ambrus and Dr. Angela Dai for their help in accessing tested data.

References (62)

  • I. Alhashim

    Deformation-driven topology-varying 3D shape correspondence

    ACM Trans. Graphics

    (2015)
  • R. Ambrus et al.

    Automatic Room Segmentation from Unstructured 3-D Data of Indoor Environments

    IEEE Rob. Autom. Lett.

    (2017)
  • I. Armeni

    3D Semantic Parsing of Large-Scale Indoor Spaces, 2016 IEEE Conference on Computer Vision and Pattern Recognition

    (2016)
  • Armeni, I., Sax, S., Zamir, A.R. and Savarese, S., 2017. Joint 2D-3D-Semantic Data for Indoor Scene Understanding....
  • T. Breuer

    Johnny: An autonomous service robot for domestic environments

    J. Intell. Rob. Syst.

    (2011)
  • Choy, C., Gwak, J. and Savarese, S., 2019. 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. 2019...
  • A. Dai

    ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes

  • H. Fu et al.

    Upright orientation of man-made objects

    ACM Trans. Graphics

    (2008)
  • C. Gomez et al.

    Object-Based Pose Graph for Dynamic Indoor Environments

    IEEE Rob. Autom. Lett.

    (2020)
  • Y. Guo

    Deep Learning for 3D Point Clouds: A Survey

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2020)
  • S. Gupta et al.

    Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation

    Int. J. Comput. Vision

    (2015)
  • Ikehata, S., Yang, H., Furukawa, Y., 2015. Structured Indoor Modeling, Proceedings of the IEEE International Conference...
  • H. Isack et al.

    Energy-Based Geometric Multi-model Fitting

    Int. J. Comput. Vision

    (2012)
  • Jiang, L., Zhao, H., Liu, S., Shen, X., Fu, C.W., Jia, J., 2019. Hierarchical point-edge interaction network for point...
  • Z. Kang et al.

    A Review of Techniques for 3D Reconstruction of Indoor Environments

    ISPRS Int. J. Geo-Inf

    (2020)
  • A. Kasper et al.

    Using Spatial Relations of Objects in Real World Scenes for Scene Structuring and Scene Understanding

  • H. Laga et al.

    Geometry and context for semantic correspondences and functionality recognition in man-made 3D shapes

    ACM Trans. Graphics

    (2013)
  • K. Lai et al.

    Object Recognition in 3D Point Clouds Using Web Data and Domain Adaptation

    Int. J. Robot. Res.

    (2010)
  • B. Li et al.

    A UWB-Based Indoor Positioning System Employing Neural Networks

    J. Geovisualiz. Spat. Anal.

    (2020)
  • L. Li

    Reconstruction of Three-Dimensional (3D) Indoor Interiors with Multiple Stories via Comprehensive Segmentation

    Remote Sens.

    (2018)
  • Li, Y., et al., 2018. PointCNN: Convolution On X-Transformed Points. arXiv:...
  • Cited by (4)

    • Slicing components guided indoor objects vectorized modeling from unilateral point cloud data

      2022, Displays
      Citation Excerpt :

      Our work mainly focuses on point-based methods. Due to its disorder and unstructured of point cloud data, pioneering work PointNet [21] is proposed to conquer this by learning pointwise features using a shared multi-layer perception (MLP) and global features using symmetrical pooling functions [22]. On the basis of PointNet, Qi et al. [23] proposed PointNet++ to capture fine-grained patterns from the neighborhood of each point.

    View full text