Deep point embedding for urban classification using ALS point clouds: A new perspective from local to global

https://doi.org/10.1016/j.isprsjprs.2020.02.020Get rights and content

Abstract

Semantic interpretation of the 3D scene is one of the most challenging problems in point cloud processing, which also deems as an essential task in a wide variety of point cloud applications. The core task of semantic interpretation is semantic labeling, namely, obtaining a unique semantic label for each point in the point cloud. Despite several reported approaches, semantic labeling continues to be a challenge owing to the complexity of scenes, objects of various scales, and the non-homogeneity of unevenly distributed points. In this paper, we propose a novel method for obtaining semantic labels of airborne laser scanning (ALS) point clouds involving the embedding of local context information for each point with multi-scale deep learning, nonlinear manifold learning for feature dimension reduction, and global graph-based optimization for refining the classification results. Specifically, we address the tasks of learning discriminative features and global labeling smoothing. The key contribution of our study is threefold. First, a hierarchical data augmentation strategy is applied to enhance the learning of deep features based on the PointNet++ network and simultaneously eliminate the artifacts caused by division and sampling while dealing with large-scale datasets. Subsequently, the learned hierarchical deep features are globally optimized and embedded into a low-dimensional space with a nonlinear manifold-based joint learning method with the removal of redundant and disturbing information. Finally, a graph-structured optimization based on the Markov random fields algorithm is performed to achieve global optimization of the initial classification results that are obtained using the embedded deep features by constructing a weighted indirect graph and solving the optimization problem with graph-cuts. We conducted thorough experiments on ALS point cloud datasets to assess the performance of our framework. Results indicate that compared to other commonly used advanced classification methods, our method can achieve high classification accuracy. The overall accuracy (OA) of our approach on the ISPRS benchmark dataset can scale up to 83.2% for classifying nine semantic classes, thereby outperforming other compared point-based strategies. Additionally, we evaluated our framework on a selected portion of the AHN3 dataset, which provided OA up to 91.2%.

Introduction

3D point clouds obtained by light detection and ranging (LiDAR) have been utilized in a wide variety of fields such as 3D city modeling (Moussa and El-Sheimy, 2010, Lafarge and Mallet, 2012, Yang et al., 2013), land cover and land use mapping (Yan et al., 2015), automatic navigation (Hebel and Stilla, 2010, Biswas and Veloso, 2012, Yang et al., 2012), change detection (Hebel and Stilla, 2011, Hebel et al., 2013), forestry monitoring (Reitberger et al., 2008, Reitberger et al., 2009, Polewski et al., 2015), construction monitoring (Bosché et al., 2015, Xu et al., 2018), historic preservation (Pan et al., 2019) and deformation monitoring (Alba et al., 2006, Zogg and Ingensand, 2008, Olsen et al., 2010). 3D point clouds have been evinced to be an efficient and effective tool for large-scale 3D mapping (Vosselman and Maas, 2010, Lin et al., 2016, Zhang et al., 2016, Wang et al., 2018). However, prior to using ALS data in applications, one of the essential tasks is to interpret semantic information of the observed scenes presented by 3D point clouds. The primary objective of the semantic interpretation of ALS point clouds is to assign each 3D point with a unique semantic label indicating the class of specific objects in the scene, in accordance with geometric or radiometric information provided by the point itself and the points in its neighborhood.

To achieve the semantic labeling of points, supervised classification is typically implemented (Vosselman et al., 2017, Li et al., 2019a). It comprises two main steps: generation of distinctive features and classification of 3D points with corresponding features with a classifier. For the extraction of features, the local context of each point is conventionally defined by its neighboring points and presented by various handcrafted mathematical expressions based on spatial or spectral attributes of these points. For the training process, the mathematical expressions of the selected representative samples are integrated into a feature vector and fed into a classifier along with the corresponding labels. Subsequently, the trained classifier can be used to classify test samples. In the classification step, classifiers such as AdaBoost (Chan and Paelinckx, 2008), support vector machines (SVM) (Mallet et al., 2011), composite kernel SVM (Ghamisi and Höfle, 2017), and random forest (RF) (Chehata et al., 2009) are used most commonly. Despite easy execution, the performance of these supervised classification approaches mainly relies on the definition of neighborhoods (Weinmann et al., 2015a) and the design of handcrafted features (Xu et al., 2019, Ghamisi and Höfle, 2017). Concerning the definition and selection of a point neighborhood in urban scenes, the scales of different types of objects vary from a tiny neighborhood to a large one; therefore, methods utilizing a fixed neighborhood size are rendered insufficient. To this end, solutions for optimizing the neighborhood are commonly proposed, e.g., optimal neighborhood adaptation (Belton and Lichti, 2006, Demantke et al., 2011, Weinmann et al., 2015a), multi-scale neighborhood aggregation (Kang and Yang, 2018, Xu et al., 2014, Zhang et al., 2016, Blomley and Weinmann, 2017), eliminating the strong assumption in fixed neighborhood setting. Regarding feature extraction, to exploit the importance of the features for classification, various types of features such as eigenvalue-based (Chehata et al., 2009, Weinmann et al., 2015a, Weinmann et al., 2015c), waveform-based (Jutzi and Gross, 2010, Zhang et al., 2011), 2D image (Zhao et al., 2018), and height features (Maas, 1999, Gorgens et al., 2017, Sun et al., 2018) have been studied. Nevertheless, the performance of these aforementioned traditional methods for both neighborhood selection and feature extraction are highly associated with the knowledge of point clouds. Alternatively, the popular deep-learning based techniques also provide feasible solutions for learning features from derived images or pointsets directly. These techniques mitigate the burden in feature design and computational efforts. In those methods mentioned above, spatial interactions between 3D points are implicitly considered in their networks. However, these interactions are not controllable. In some cases, there are still heterogeneity in the classification results, especially in low-density areas and borders of urban objects. Moreover, even for the classification results obtained by deep learning methods that seem to be more robust to noise and outliers, heterogeneity is inevitable on the boundaries of patches owing to the required division and sampling process while preparing inputs for the networks. Therefore, contextual information is typically considered to improve the spatial smoothness of the classification results. Furthermore, to encode the spatial dependencies between 3D points, a graph structure is usually constructed to model the adjacency relationship. Numerous optimization strategies are based on certain classic graphical models such as Markov random fields (Munoz et al., 2009, Lu and Rasmussen, 2012, Kang and Yang, 2018) and conditional random fields (CRF) (Niemeyer et al., 2014, Weinmann et al., 2015b, Yao et al., 2017, Vosselman et al., 2017, Li et al., 2019b). In Landrieu et al. (2017), instead of fixing on some standard graphical models, a general mathematical optimization framework with more versatile solutions for spatial smoothing is proposed. Considering all the aforementioned aspects, the objective of our study is to propose an approach for point-wise classification of ALS point clouds, wherein the following two major challenges are addressed: (i) to find an optimal representation of features for the classes of interest and (ii) to improve the classification result by using an optimization algorithm.

The reminder of this paper is organized as follows. Section 2 discusses the related literature and presents the contributions of our work. Section 3 describes the entire workflow of our method for ALS point cloud classification. Section 4 describes the datasets and the design of the experiments. Section 5 presents and analyzes the experimental results. Additionally, Section 6 discusses the experimental results. Section 7 provides the conclusions and future direction for our work.

Section snippets

Point cloud classification with deep learning techniques

In the last decade, high-performance computation resources and data acquisition techniques have experienced exponential growth, thereby leading to the availability of large-scale datasets. Deep learning techniques, which highly rely on these two factors, have gained immense popularity over recent years. They have achieved advances in numerous fields such as scene interpretation, object detection, and target tracking. Recent investigations, especially, moving beyond the applications on 2D

Methodology

The proposed methodology for point cloud classification comprises three essential steps: hierarchical deep feature learning (HDL), joint manifold-based embedding (JME), and global graph-based optimization (GGO). Fig. 1 illustrates the framework presenting the essential steps of the involved methods and sample results. The detailed explanation of each step in the framework is introduced in the following sub-sections.

The HDL step is designed for learning the original features of points. Here,

Experiments

For the performance assessment of our method, experiments using two ALS datasets were conducted, and their results were evaluated and analyzed. The first experiment uses the ISPRS benchmark dataset of Vaihingen for 3D labeling of ALS point clouds (ISPRS benchmark dataset) (Cramer, 2010, Rottensteiner et al., 2012), whereas the second uses the ALS dataset of a selected area provided by Actueel Hoogtebestand Nederland (Xu et al., 2014, Vosselman et al., 2017).1

Results using ISPRS benchmark dataset

To test the performance of our feature learning method, we conducted comparisons between the results with the following conditions: (i) using original single-scale deep features (SDF) with original PointNet++, (ii) using the proposed MDF with RF classifier, and (iii) using joint embedded features (JEF) with RF classifier. Furthermore, to evaluate the performance of the feature dimensionality reduction in addition to our method, we implemented several classic dimensionality reduction methods to

Influence of different hyperparameters on embedding performance

To effectively obtain optimal feature representation, high-dimensional features are embedded into a reduced dimensional feature space by using joint manifold learning; however, the performance of JME is highly dependent on certain hyperparameters. To investigate the influence of the hyperparameters on the embedding performance, we assessed the sensitivity of these parameters by altering their values in reasonable ranges. Two key parameters, number of neighbors and dimensionality, were tested in

Conclusion

In this paper, we proposed a novel framework for the point-based classification of ALS point clouds. We used a deep neural network, PointNet++, to directly extract deep features from pointsets. Furthermore, to generate features with high robustness and distinction, a hierarchical subdivision strategy and a novel robust manifold-learning-based algorithm were employed for MDF embedding. Finally, the classification results were optimized to make them locally continuous and globally optimal by

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This research is supported by the China Scholarship Council. This work was carried out within the frame of Leonhard Obermeyer Center (LOC) at Technische Universität München (TUM) [www.loc.tum.de].

References (93)

  • C. Mallet et al.

    Relevance assessment of full-waveform lidar data for urban area classification

    ISPRS J. Photogram. Remote Sens.

    (2011)
  • J. Niemeyer et al.

    Contextual classification of lidar data and building object detection in urban areas

    ISPRS J. Photogram. Remote Sens.

    (2014)
  • P. Polewski et al.

    Detection of fallen trees in als point clouds using a normalized cut approach trained by simulation

    ISPRS J. Photogram. Remote Sens.

    (2015)
  • J. Reitberger et al.

    3d segmentation of single trees exploiting full waveform lidar data

    ISPRS J. Photogram. Remote Sens.

    (2009)
  • G. Vosselman et al.

    Contextual segment-based classification of airborne laser scanner data

    ISPRS J. Photogram. Remote Sens.

    (2017)
  • Y. Wang et al.

    Dynamic graph cnn for learning on point clouds

    ACM Trans. Graph.

    (2019)
  • M. Weinmann et al.

    Semantic point cloud interpretation based on optimal neighborhoods, relevant features and efficient classifiers

    ISPRS J. Photogram. Remote Sens.

    (2015)
  • M. Weinmann et al.

    Distinctive 2d and 3d features for automated large-scale scene analysis in urban areas

    Comput. Graph.

    (2015)
  • S. Xu et al.

    Multiple-entity based classification of airborne laser scanning data in urban areas

    ISPRS J. Photogram. Remote Sens.

    (2014)
  • W.Y. Yan et al.

    Urban land cover classification using airborne lidar data: a review

    Remote Sens. Environ.

    (2015)
  • M. Yousefhussien et al.

    A multi-scale fully convolutional network for semantic labeling of 3d point clouds

    ISPRS J. Photogram. Remote Sens.

    (2018)
  • M. Alba et al.

    Structural monitoring of a large dam by terrestrial laser scanning

    Int. Arch. Photogram. Remote Sens. Spatial Inform. Sci.

    (2006)
  • I. Armeni et al.

    3d semantic parsing of large-scale indoor spaces

  • C.M. Bachmann et al.

    Exploiting manifold geometry in hyperspectral imagery

    IEEE Trans. Geosci. Remote Sens.

    (2005)
  • M. Belkin et al.

    Laplacian eigenmaps for dimensionality reduction and data representation

    Neural Comput.

    (2003)
  • D. Belton et al.

    Classification and segmentation of terrestrial laser scanner point clouds using local variance information

    Int. Arch. Photogram. Remote Sens. Spatial Inform. Sci.

    (2006)
  • J. Biswas et al.

    Depth camera based indoor mobile robot localization and navigation

  • R. Blomley et al.

    Using multi-scale features for the 3d semantic labeling of airborne laser scanning data

    ISPRS Ann. Photogram. Remote Sens. Spatial Inform. Sci.

    (2017)
  • Y. Boykov et al.

    An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2004)
  • Y. Boykov et al.

    Fast approximate energy minimization via graph cuts

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2001)
  • N. Chehata et al.

    Airborne lidar feature selection for urban classification using random forests

    Int. Arch. Photogram. Remote Sens. Spatial Inform. Sci.

    (2009)
  • X. Chen et al.

    Multi-view 3d object detection network for autonomous driving

  • Cramer, M., 2010. The dgpf-test on digital airborne camera evaluation–overview and test design....
  • A. Dai et al.

    Scannet: Richly-annotated 3d reconstructions of indoor scenes

  • J. Demantke et al.

    Dimensionality based scale selection in 3d lidar point clouds

    Int. Arch. Photogram. Remote Sens. Spatial Inform. Sci.

    (2011)
  • M. Engelcke et al.

    Vote3deep: Fast object detection in 3d point clouds using efficient convolutional neural networks

  • Geiger, A., Lenz, P., Urtasun, R., 2012. Are we ready for autonomous driving? The kitti vision benchmark suite. In:...
  • P. Ghamisi et al.

    Lidar data classification using extinction profiles and a composite kernel support vector machine

    IEEE Geosci. Remote Sens. Lett.

    (2017)
  • E.B. Gorgens et al.

    A method for optimizing height threshold when computing airborne laser scanning metrics

    Photogram. Eng. Remote Sens.

    (2017)
  • Hackel, T., Savinov, N., Ladicky, L., Wegner, J.D., Schindler, K., Pollefeys, M., 2017. SEMANTIC3D.NET: A new...
  • Hebel, M., Stilla, U., 2010. Als-aided navigation of helicopters or uavs over urban terrain. In: EuroCOW 2010, The...
  • M. Hebel et al.

    Simultaneous calibration of als systems and alignment of multiview lidar scans of urban areas

    IEEE Trans. Geosci. Remote Sens.

    (2011)
  • D. Hong et al.

    Learning a robust local manifold representation for hyperspectral dimensionality reduction

    IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens.

    (2017)
  • R. Huang et al.

    Multi-scale local context embedding for lidar point cloud classification

    IEEE Geosci. Remote Sens. Lett.

    (2019)
  • X. Huo et al.

    Local linear projection (llp)

  • B. Jutzi et al.

    Investigations on surface reflection models for intensity normalization in airborne laser scanning (als) data

    Photogram. Eng. Remote Sens.

    (2010)
  • Cited by (62)

    • Semantic supported urban change detection using ALS point clouds

      2023, International Journal of Applied Earth Observation and Geoinformation
    • A co-learning method to utilize optical images and photogrammetric point clouds for building extraction

      2023, International Journal of Applied Earth Observation and Geoinformation
    • Joint learning of frequency and spatial domains for dense image prediction

      2023, ISPRS Journal of Photogrammetry and Remote Sensing
      Citation Excerpt :

      Nevertheless, Transformer (Vaswani et al., 2017) demands huge computational and storage resources, even though some improvements have been made (e.g., Linformer Wang et al., 2020b). Apart from the pyramid-like structure (Adelson et al., 1984; Goodfellow et al., 2016; Polewski et al., 2021; He et al., 2016) in CNNs and the self-attention Transformer model (Vaswani et al., 2017), tremendous works have deployed conditional random fields (CRFs) and Markov random fields (MRFs) to capture the global contextual information (Cao et al., 2017; Xu et al., 2020; Huang et al., 2020). These methods have yet the difficulty in constructing an end-to-end network and being optimized.

    View all citing articles on Scopus
    View full text