Deep point embedding for urban classification using ALS point clouds: A new perspective from local to global
Introduction
3D point clouds obtained by light detection and ranging (LiDAR) have been utilized in a wide variety of fields such as 3D city modeling (Moussa and El-Sheimy, 2010, Lafarge and Mallet, 2012, Yang et al., 2013), land cover and land use mapping (Yan et al., 2015), automatic navigation (Hebel and Stilla, 2010, Biswas and Veloso, 2012, Yang et al., 2012), change detection (Hebel and Stilla, 2011, Hebel et al., 2013), forestry monitoring (Reitberger et al., 2008, Reitberger et al., 2009, Polewski et al., 2015), construction monitoring (Bosché et al., 2015, Xu et al., 2018), historic preservation (Pan et al., 2019) and deformation monitoring (Alba et al., 2006, Zogg and Ingensand, 2008, Olsen et al., 2010). 3D point clouds have been evinced to be an efficient and effective tool for large-scale 3D mapping (Vosselman and Maas, 2010, Lin et al., 2016, Zhang et al., 2016, Wang et al., 2018). However, prior to using ALS data in applications, one of the essential tasks is to interpret semantic information of the observed scenes presented by 3D point clouds. The primary objective of the semantic interpretation of ALS point clouds is to assign each 3D point with a unique semantic label indicating the class of specific objects in the scene, in accordance with geometric or radiometric information provided by the point itself and the points in its neighborhood.
To achieve the semantic labeling of points, supervised classification is typically implemented (Vosselman et al., 2017, Li et al., 2019a). It comprises two main steps: generation of distinctive features and classification of 3D points with corresponding features with a classifier. For the extraction of features, the local context of each point is conventionally defined by its neighboring points and presented by various handcrafted mathematical expressions based on spatial or spectral attributes of these points. For the training process, the mathematical expressions of the selected representative samples are integrated into a feature vector and fed into a classifier along with the corresponding labels. Subsequently, the trained classifier can be used to classify test samples. In the classification step, classifiers such as AdaBoost (Chan and Paelinckx, 2008), support vector machines (SVM) (Mallet et al., 2011), composite kernel SVM (Ghamisi and Höfle, 2017), and random forest (RF) (Chehata et al., 2009) are used most commonly. Despite easy execution, the performance of these supervised classification approaches mainly relies on the definition of neighborhoods (Weinmann et al., 2015a) and the design of handcrafted features (Xu et al., 2019, Ghamisi and Höfle, 2017). Concerning the definition and selection of a point neighborhood in urban scenes, the scales of different types of objects vary from a tiny neighborhood to a large one; therefore, methods utilizing a fixed neighborhood size are rendered insufficient. To this end, solutions for optimizing the neighborhood are commonly proposed, e.g., optimal neighborhood adaptation (Belton and Lichti, 2006, Demantke et al., 2011, Weinmann et al., 2015a), multi-scale neighborhood aggregation (Kang and Yang, 2018, Xu et al., 2014, Zhang et al., 2016, Blomley and Weinmann, 2017), eliminating the strong assumption in fixed neighborhood setting. Regarding feature extraction, to exploit the importance of the features for classification, various types of features such as eigenvalue-based (Chehata et al., 2009, Weinmann et al., 2015a, Weinmann et al., 2015c), waveform-based (Jutzi and Gross, 2010, Zhang et al., 2011), 2D image (Zhao et al., 2018), and height features (Maas, 1999, Gorgens et al., 2017, Sun et al., 2018) have been studied. Nevertheless, the performance of these aforementioned traditional methods for both neighborhood selection and feature extraction are highly associated with the knowledge of point clouds. Alternatively, the popular deep-learning based techniques also provide feasible solutions for learning features from derived images or pointsets directly. These techniques mitigate the burden in feature design and computational efforts. In those methods mentioned above, spatial interactions between 3D points are implicitly considered in their networks. However, these interactions are not controllable. In some cases, there are still heterogeneity in the classification results, especially in low-density areas and borders of urban objects. Moreover, even for the classification results obtained by deep learning methods that seem to be more robust to noise and outliers, heterogeneity is inevitable on the boundaries of patches owing to the required division and sampling process while preparing inputs for the networks. Therefore, contextual information is typically considered to improve the spatial smoothness of the classification results. Furthermore, to encode the spatial dependencies between 3D points, a graph structure is usually constructed to model the adjacency relationship. Numerous optimization strategies are based on certain classic graphical models such as Markov random fields (Munoz et al., 2009, Lu and Rasmussen, 2012, Kang and Yang, 2018) and conditional random fields (CRF) (Niemeyer et al., 2014, Weinmann et al., 2015b, Yao et al., 2017, Vosselman et al., 2017, Li et al., 2019b). In Landrieu et al. (2017), instead of fixing on some standard graphical models, a general mathematical optimization framework with more versatile solutions for spatial smoothing is proposed. Considering all the aforementioned aspects, the objective of our study is to propose an approach for point-wise classification of ALS point clouds, wherein the following two major challenges are addressed: (i) to find an optimal representation of features for the classes of interest and (ii) to improve the classification result by using an optimization algorithm.
The reminder of this paper is organized as follows. Section 2 discusses the related literature and presents the contributions of our work. Section 3 describes the entire workflow of our method for ALS point cloud classification. Section 4 describes the datasets and the design of the experiments. Section 5 presents and analyzes the experimental results. Additionally, Section 6 discusses the experimental results. Section 7 provides the conclusions and future direction for our work.
Section snippets
Point cloud classification with deep learning techniques
In the last decade, high-performance computation resources and data acquisition techniques have experienced exponential growth, thereby leading to the availability of large-scale datasets. Deep learning techniques, which highly rely on these two factors, have gained immense popularity over recent years. They have achieved advances in numerous fields such as scene interpretation, object detection, and target tracking. Recent investigations, especially, moving beyond the applications on 2D
Methodology
The proposed methodology for point cloud classification comprises three essential steps: hierarchical deep feature learning (HDL), joint manifold-based embedding (JME), and global graph-based optimization (GGO). Fig. 1 illustrates the framework presenting the essential steps of the involved methods and sample results. The detailed explanation of each step in the framework is introduced in the following sub-sections.
The HDL step is designed for learning the original features of points. Here,
Experiments
For the performance assessment of our method, experiments using two ALS datasets were conducted, and their results were evaluated and analyzed. The first experiment uses the ISPRS benchmark dataset of Vaihingen for 3D labeling of ALS point clouds (ISPRS benchmark dataset) (Cramer, 2010, Rottensteiner et al., 2012), whereas the second uses the ALS dataset of a selected area provided by Actueel Hoogtebestand Nederland (Xu et al., 2014, Vosselman et al., 2017).1
Results using ISPRS benchmark dataset
To test the performance of our feature learning method, we conducted comparisons between the results with the following conditions: (i) using original single-scale deep features (SDF) with original PointNet++, (ii) using the proposed MDF with RF classifier, and (iii) using joint embedded features (JEF) with RF classifier. Furthermore, to evaluate the performance of the feature dimensionality reduction in addition to our method, we implemented several classic dimensionality reduction methods to
Influence of different hyperparameters on embedding performance
To effectively obtain optimal feature representation, high-dimensional features are embedded into a reduced dimensional feature space by using joint manifold learning; however, the performance of JME is highly dependent on certain hyperparameters. To investigate the influence of the hyperparameters on the embedding performance, we assessed the sensitivity of these parameters by altering their values in reasonable ranges. Two key parameters, number of neighbors and dimensionality, were tested in
Conclusion
In this paper, we proposed a novel framework for the point-based classification of ALS point clouds. We used a deep neural network, PointNet++, to directly extract deep features from pointsets. Furthermore, to generate features with high robustness and distinction, a hierarchical subdivision strategy and a novel robust manifold-learning-based algorithm were employed for MDF embedding. Finally, the classification results were optimized to make them locally continuous and globally optimal by
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
This research is supported by the China Scholarship Council. This work was carried out within the frame of Leonhard Obermeyer Center (LOC) at Technische Universität München (TUM) [www.loc.tum.de].
References (93)
- et al.
The value of integrating scan-to-bim and scan-vs-bim techniques for construction monitoring using laser scanning and bim: the case of cylindrical mep components
Autom. Constr.
(2015) - et al.
Snapnet: 3d point cloud semantic labeling with 2d deep segmentation networks
Comput. Graph.
(2018) - et al.
Evaluation of random forest and adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery
Remote Sens. Environ.
(2008) - et al.
Classification of airborne laser scanning data using jointboost
ISPRS J. Photogram. Remote Sens.
(2015) - et al.
Change detection in urban areas by object-based analysis and on-the-fly comparison of multi-view als data
ISPRS J. Photogram. Remote Sens.
(2013) - et al.
A probabilistic graphical model for the classification of mobile lidar point clouds
ISPRS J. Photogram. Remote Sens.
(2018) - et al.
A structured regularization framework for spatially smoothing semantic labelings of 3d point clouds
ISPRS J. Photogram. Remote Sens.
(2017) - et al.
Improving lidar classification accuracy by contextual label smoothing in post-processing
ISPRS J. Photogram. Remote Sens.
(2019) - et al.
Higher-order conditional random fields-based 3d semantic labeling of airborne laser-scanning point clouds
Remote Sens.
(2019) - et al.
Planar-based adaptive down-sampling of point clouds
Photogram. Eng. Remote Sens.
(2016)
Relevance assessment of full-waveform lidar data for urban area classification
ISPRS J. Photogram. Remote Sens.
Contextual classification of lidar data and building object detection in urban areas
ISPRS J. Photogram. Remote Sens.
Detection of fallen trees in als point clouds using a normalized cut approach trained by simulation
ISPRS J. Photogram. Remote Sens.
3d segmentation of single trees exploiting full waveform lidar data
ISPRS J. Photogram. Remote Sens.
Contextual segment-based classification of airborne laser scanner data
ISPRS J. Photogram. Remote Sens.
Dynamic graph cnn for learning on point clouds
ACM Trans. Graph.
Semantic point cloud interpretation based on optimal neighborhoods, relevant features and efficient classifiers
ISPRS J. Photogram. Remote Sens.
Distinctive 2d and 3d features for automated large-scale scene analysis in urban areas
Comput. Graph.
Multiple-entity based classification of airborne laser scanning data in urban areas
ISPRS J. Photogram. Remote Sens.
Urban land cover classification using airborne lidar data: a review
Remote Sens. Environ.
A multi-scale fully convolutional network for semantic labeling of 3d point clouds
ISPRS J. Photogram. Remote Sens.
Structural monitoring of a large dam by terrestrial laser scanning
Int. Arch. Photogram. Remote Sens. Spatial Inform. Sci.
3d semantic parsing of large-scale indoor spaces
Exploiting manifold geometry in hyperspectral imagery
IEEE Trans. Geosci. Remote Sens.
Laplacian eigenmaps for dimensionality reduction and data representation
Neural Comput.
Classification and segmentation of terrestrial laser scanner point clouds using local variance information
Int. Arch. Photogram. Remote Sens. Spatial Inform. Sci.
Depth camera based indoor mobile robot localization and navigation
Using multi-scale features for the 3d semantic labeling of airborne laser scanning data
ISPRS Ann. Photogram. Remote Sens. Spatial Inform. Sci.
An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision
IEEE Trans. Pattern Anal. Mach. Intell.
Fast approximate energy minimization via graph cuts
IEEE Trans. Pattern Anal. Mach. Intell.
Airborne lidar feature selection for urban classification using random forests
Int. Arch. Photogram. Remote Sens. Spatial Inform. Sci.
Multi-view 3d object detection network for autonomous driving
Scannet: Richly-annotated 3d reconstructions of indoor scenes
Dimensionality based scale selection in 3d lidar point clouds
Int. Arch. Photogram. Remote Sens. Spatial Inform. Sci.
Vote3deep: Fast object detection in 3d point clouds using efficient convolutional neural networks
Lidar data classification using extinction profiles and a composite kernel support vector machine
IEEE Geosci. Remote Sens. Lett.
A method for optimizing height threshold when computing airborne laser scanning metrics
Photogram. Eng. Remote Sens.
Simultaneous calibration of als systems and alignment of multiview lidar scans of urban areas
IEEE Trans. Geosci. Remote Sens.
Learning a robust local manifold representation for hyperspectral dimensionality reduction
IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens.
Multi-scale local context embedding for lidar point cloud classification
IEEE Geosci. Remote Sens. Lett.
Local linear projection (llp)
Investigations on surface reflection models for intensity normalization in airborne laser scanning (als) data
Photogram. Eng. Remote Sens.
Cited by (62)
One Class One Click: Quasi scene-level weakly supervised point cloud semantic segmentation with active learning
2023, ISPRS Journal of Photogrammetry and Remote SensingGL-Net: Semantic segmentation for point clouds of shield tunnel via global feature learning and local feature discriminative aggregation
2023, ISPRS Journal of Photogrammetry and Remote SensingSemantic supported urban change detection using ALS point clouds
2023, International Journal of Applied Earth Observation and GeoinformationA co-learning method to utilize optical images and photogrammetric point clouds for building extraction
2023, International Journal of Applied Earth Observation and GeoinformationJoint learning of frequency and spatial domains for dense image prediction
2023, ISPRS Journal of Photogrammetry and Remote SensingCitation Excerpt :Nevertheless, Transformer (Vaswani et al., 2017) demands huge computational and storage resources, even though some improvements have been made (e.g., Linformer Wang et al., 2020b). Apart from the pyramid-like structure (Adelson et al., 1984; Goodfellow et al., 2016; Polewski et al., 2021; He et al., 2016) in CNNs and the self-attention Transformer model (Vaswani et al., 2017), tremendous works have deployed conditional random fields (CRFs) and Markov random fields (MRFs) to capture the global contextual information (Cao et al., 2017; Xu et al., 2020; Huang et al., 2020). These methods have yet the difficulty in constructing an end-to-end network and being optimized.