Semantic segmentation on Swiss3DCities: A benchmark study on aerial photogrammetric 3D pointcloud dataset
Introduction
Many recent achievements of deep learning depend on the availability of large labeled training datasets [10], [27], such as ImageNet [9] for image classification and MS COCO [18] for image segmentation. In this work, we propose a new dataset of dense urban 3D pointclouds, spanning , acquired using photogrammetry from three cities in Switzerland (Zurich, Zug and Davos). The entire dataset is manually annotated with dense labels, which associate a point to one of five categories: terrain, construction, vegetation, vehicle, and urban asset.
The main goal of the dataset is to train semantic segmentation algorithms for urban environments. Semantic segmentation consists in partitioning the data into multiple sets of points, such that each set represents only objects of a given type. The problem is relevant for many real-world applications, such as autonomous driving, content generation for games [15], augmented reality applications, and city planning [35].
Most existing datasets [3], [25], [30] for outdoor 3D semantic segmentation are motivated by real-time autonomous driving applications, and are therefore acquired at low resolution by street-level Light Detection and Ranging (LiDAR) sensors; this yields incomplete point clouds (for example, areas far from roads, such as roofs, are either not acquired or acquired with very low resolution) which are unsuitable for applications such as city planning, urban augmented or virtual reality (AR/VR), or gaming. In contrast, we acquire high-resolution photographs from unmanned aerial vehicles (UAV) flying on a grid pattern over the area of interest, then reconstruct the 3D shape using photogrammetry; this allows us to densely acquire most outdoor surfaces. Similar approaches have been previously adopted for several applications, including automatic urban area mapping [20], damage detection [19], and cultural heritage site mapping for digital preservation [22]. Compared to 3D models built by satellite-borne cameras, this approach yields models with higher-resolution geometry and texture.
High resolution data yields more accurate models, but also aids the segmentation task because it contains more information to discriminate between different classes; currently, state-of-the-art models for 3D semantic segmentation rely on deep learning [4], [11], [36] and represent input data as voxels [37], points [23] or meshes [17]; other approaches render multiple views of the 3D scene and then rely on 2D semantic segmentation models [5], [28], [29], which can be trained on more abundant 2D labeled semantic segmentation datasets.
To show the potential of our dataset for training and evaluating segmentation algorithms, we consider the well-established PointNet++ model [23], [24] and report its performance when using different splits for training and evaluation. In particular, the performance of machine learning models depends not only on the size of the training dataset, but also on how representative it is of the evaluation data: often, models trained on large amounts of data from a given environment fail to generalize to a different target environment. Because our dataset contains data from three cities with different characteristics, it can be used to explore this fundamental aspect.
The rest of the paper is organized into five sections. We first describe related commonly-used datasets for 3D semantic segmentation in Section 2. Then, in Section 3 we present our main contribution: a new pointwise labeled multi-city dataset for semantic segmentation of outdoor 3D point clouds, which we release to the research community in three versions with different point densities1; we characterize the dataset and describe data acquisition, processing and manual labeling pipelines. In Section 4 we describe the applied deep learning model to demonstrate semantic segmentation task on our dataset. We discuss quantitative results in Section 5, where we also explore the model’s generalization ability across different cities (secondary contribution). Section 6 concludes the paper.
Section snippets
Related work
This section summarizes relevant pointcloud datasets with semantic segmentation labels (see Table 1). One fundamental difference among the datasets is their acquisition modality, i.e. LiDAR or photogrammetry.
Dataset description
We describe the process used to produce our large scale aerial photogrammetry dataset, covering both acquisition of source photographs and processing to obtain 3D point clouds. We conclude the section by detailing the data characteristics.
Semantic segmentation
To provide baseline performance metrics, we report experiments using PointNet++ [24], a well-established pointcloud segmentation approach.
Overall performance metrics
On the three testing tiles, the model trained on all cities yields an overall accuracy of 82.8%, average F1 of 56.0%, and average IoU score of 45.3%.
Per-category performance
Table 2 reports, for each city and for each class, the performance of the model trained on data from all cities; namely, we report the per-class F1 score and IoU metrics.
We observe that “urban asset” and “vehicle” classes are harder to segment compared to other classes; this is expected due to their small size, and widely variable characteristics
Conclusion
This paper introduces a novel urban pointcloud dataset with pointwise semantic groundtruth. The dataset is constructed via photogrammetry on UAV-acquired high-resolution images of three Swiss cities. The dataset reports three pointcloud densities: a sparse pointcloud with RGB colors and semantic labels, a regular density pointcloud with RGB colors and semantic labels, and a dense pointcloud with only x,y,z coordinates.
The paper describes the acquisition and processing of the dataset, then
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work is supported partially by Nomoko AG and the Swiss Confederation through Innosuisse research project 31889.1 IP-ICT and through the NCCR Robotics. The authors would like to thank the collaborators in the SUPSI-ISIN institute and the colleagues in Nomoko AG for their contributions, specifically to Juan Vinuales and Mario Sanchez Gallardo for the drone flight operations; to Alexandre Ferreira do Carmo, Hugo Filipe Queiros da Cunha, Vincent Schmid, and Simon Scherer for supporting the
References (38)
- et al.
Snapnet: 3d point cloud semantic labeling with 2D deep segmentation networks
Computers & Graphics
(2018) Combining forecasts: a review and annotated bibliography
Int J Forecast
(1989)Optimal linear combinations of neural networks
Neural networks
(1997)- et al.
A multi-view recurrent neural network for 3d mesh segmentation
Computers & Graphics
(2017) - et al.
Deep learning-based damage detection from aerial SFM point clouds
Drones
(2019) - et al.
Contextual classification of lidar data and building object detection in urban areas
ISPRS J. Photogramm. Remote Sens.
(2014) Deep learning in neural networks: an overview
Neural networks
(2015)- et al.
3D semantic parsing of large-scale indoor spaces
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2016) - et al.
Classification of aerial photogrammetric 3D point clouds
Photogrammetric Engineering & Remote Sensing
(2018) - et al.
Semantickitti: A dataset for semantic scene understanding of lidar sequences
Proceedings of the IEEE International Conference on Computer Vision
(2019)
Deep learning on 3d point clouds
Remote Sens (Basel)
Imagenet: A large-scale hierarchical image database
2009 IEEE Conference on Computer Vision and Pattern Recognition
Deep learning
Deep learning for 3d point clouds: a survey
IEEE Trans Pattern Anal Mach Intell
Semantic3d.net: A new large-scale point cloud classification benchmark
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
3d reconstruction from photographs by cmp sfm web service
2015 14th IAPR International Conference on Machine Vision Applications (MVA)
Procedural content generation for games: a survey
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)
Cited by (24)
Glass façade segmentation and repair for aerial photogrammetric 3D building models with multiple constraints
2023, International Journal of Applied Earth Observation and GeoinformationGeneration of hyperspectral point clouds: Mapping, compression and rendering
2022, Computers and Graphics (Pergamon)Citation Excerpt :Likewise, the use of UAV platforms makes easier the capture of target entities from multiple viewpoints. Accordingly, a wide variety of natural and urban scenarios can be efficiently reconstructed for multiple purposes, from forest monitoring [3,20,21] to urban planning [22–25]. In particular, a wide variety of sensors has burst onto the market for capturing the three-dimensional features of natural and urban environments.
SUM: A benchmark dataset of Semantic Urban Meshes
2021, ISPRS Journal of Photogrammetry and Remote SensingCitation Excerpt :The leading causes are the Lambertian surface assumption during the image matching and the inadequate image overlapping rate during the flight. Similarly, the Swiss3DCities (Can et al., 2021) was recently released that covers three cities in Zurich but twice smaller than the SensatUrban. The annotation work was conducted on a simplified mesh in the software Blender (B. Foundation, 2002), and then the semantics were transferred to the mesh vertices, which are regarded as point clouds, via the nearest neighbour search.
CENAGIS-ALS BENCHMARK - NEW PROPOSAL FOR DENSE ALS BENCHMARK BASED ON THE REVIEW OF DATASETS AND BENCHMARKS FOR 3D POINT CLOUD SEGMENTATION
2023, International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS ArchivesA NEW DATASET AND METHODOLOGY FOR URBAN-SCALE 3D POINT CLOUD CLASSIFICATION
2023, International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS ArchivesAn Enhanced Multi-Objective-Derived Adaptive DeepLabv3 Using G-RDA for Semantic Segmentation of Aerial Images
2023, Arabian Journal for Science and Engineering