Elsevier

Pattern Recognition Letters

Volume 150, October 2021, Pages 108-114
Pattern Recognition Letters

Semantic segmentation on Swiss3DCities: A benchmark study on aerial photogrammetric 3D pointcloud dataset

https://doi.org/10.1016/j.patrec.2021.06.004Get rights and content

Highlights

  • Importance of the data size for pointcloud segmentation with deep learning.

  • Analyzing model generalization over cities for a deep point segmentation model.

  • Viability of the simple model ensembling approaches to improve performance.

Abstract

We introduce a new outdoor urban 3D pointcloud dataset, covering a total area of 2.7km2, sampled from three Swiss cities with different characteristics. The dataset is manually annotated for semantic segmentation with per-point labels, and is built using photogrammetry from images acquired by multirotors equipped with high-resolution cameras. In contrast to datasets acquired with ground LiDAR sensors, the resulting point clouds are uniformly dense and complete, and are useful to disparate applications, including autonomous driving, gaming and smart city planning. As a benchmark, we report quantitative results of PointNet++, an established point-based deep 3D semantic segmentation model; on this model, we additionally study the impact of using different cities for model generalization.

Introduction

Many recent achievements of deep learning depend on the availability of large labeled training datasets [10], [27], such as ImageNet [9] for image classification and MS COCO [18] for image segmentation. In this work, we propose a new dataset of dense urban 3D pointclouds, spanning 2.7km2, acquired using photogrammetry from three cities in Switzerland (Zurich, Zug and Davos). The entire dataset is manually annotated with dense labels, which associate a point to one of five categories: terrain, construction, vegetation, vehicle, and urban asset.

The main goal of the dataset is to train semantic segmentation algorithms for urban environments. Semantic segmentation consists in partitioning the data into multiple sets of points, such that each set represents only objects of a given type. The problem is relevant for many real-world applications, such as autonomous driving, content generation for games [15], augmented reality applications, and city planning [35].

Most existing datasets [3], [25], [30] for outdoor 3D semantic segmentation are motivated by real-time autonomous driving applications, and are therefore acquired at low resolution by street-level Light Detection and Ranging (LiDAR) sensors; this yields incomplete point clouds (for example, areas far from roads, such as roofs, are either not acquired or acquired with very low resolution) which are unsuitable for applications such as city planning, urban augmented or virtual reality (AR/VR), or gaming. In contrast, we acquire high-resolution photographs from unmanned aerial vehicles (UAV) flying on a grid pattern over the area of interest, then reconstruct the 3D shape using photogrammetry; this allows us to densely acquire most outdoor surfaces. Similar approaches have been previously adopted for several applications, including automatic urban area mapping [20], damage detection [19], and cultural heritage site mapping for digital preservation [22]. Compared to 3D models built by satellite-borne cameras, this approach yields models with higher-resolution geometry and texture.

High resolution data yields more accurate models, but also aids the segmentation task because it contains more information to discriminate between different classes; currently, state-of-the-art models for 3D semantic segmentation rely on deep learning [4], [11], [36] and represent input data as voxels [37], points [23] or meshes [17]; other approaches render multiple views of the 3D scene and then rely on 2D semantic segmentation models [5], [28], [29], which can be trained on more abundant 2D labeled semantic segmentation datasets.

To show the potential of our dataset for training and evaluating segmentation algorithms, we consider the well-established PointNet++ model [23], [24] and report its performance when using different splits for training and evaluation. In particular, the performance of machine learning models depends not only on the size of the training dataset, but also on how representative it is of the evaluation data: often, models trained on large amounts of data from a given environment fail to generalize to a different target environment. Because our dataset contains data from three cities with different characteristics, it can be used to explore this fundamental aspect.

The rest of the paper is organized into five sections. We first describe related commonly-used datasets for 3D semantic segmentation in Section 2. Then, in Section 3 we present our main contribution: a new pointwise labeled multi-city dataset for semantic segmentation of outdoor 3D point clouds, which we release to the research community in three versions with different point densities1; we characterize the dataset and describe data acquisition, processing and manual labeling pipelines. In Section 4 we describe the applied deep learning model to demonstrate semantic segmentation task on our dataset. We discuss quantitative results in Section 5, where we also explore the model’s generalization ability across different cities (secondary contribution). Section 6 concludes the paper.

Section snippets

Related work

This section summarizes relevant pointcloud datasets with semantic segmentation labels (see Table 1). One fundamental difference among the datasets is their acquisition modality, i.e. LiDAR or photogrammetry.

Dataset description

We describe the process used to produce our large scale aerial photogrammetry dataset, covering both acquisition of source photographs and processing to obtain 3D point clouds. We conclude the section by detailing the data characteristics.

Semantic segmentation

To provide baseline performance metrics, we report experiments using PointNet++ [24], a well-established pointcloud segmentation approach.

Overall performance metrics

On the three testing tiles, the model trained on all cities yields an overall accuracy of 82.8%, average F1 of 56.0%, and average IoU score of 45.3%.

Per-category performance

Table 2 reports, for each city and for each class, the performance of the model trained on data from all cities; namely, we report the per-class F1 score and IoU metrics.

We observe that “urban asset” and “vehicle” classes are harder to segment compared to other classes; this is expected due to their small size, and widely variable characteristics

Conclusion

This paper introduces a novel urban pointcloud dataset with pointwise semantic groundtruth. The dataset is constructed via photogrammetry on UAV-acquired high-resolution images of three Swiss cities. The dataset reports three pointcloud densities: a sparse pointcloud with RGB colors and semantic labels, a regular density pointcloud with RGB colors and semantic labels, and a dense pointcloud with only x,y,z coordinates.

The paper describes the acquisition and processing of the dataset, then

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported partially by Nomoko AG and the Swiss Confederation through Innosuisse research project 31889.1 IP-ICT and through the NCCR Robotics. The authors would like to thank the collaborators in the SUPSI-ISIN institute and the colleagues in Nomoko AG for their contributions, specifically to Juan Vinuales and Mario Sanchez Gallardo for the drone flight operations; to Alexandre Ferreira do Carmo, Hugo Filipe Queiros da Cunha, Vincent Schmid, and Simon Scherer for supporting the

References (38)

  • S.A. Bello et al.

    Deep learning on 3d point clouds

    Remote Sens (Basel)

    (2020)
  • CapturingReality, Reality Capture, Drienova 3, 821 01 Bratislava, Slovakia, 2016....
  • B.O. Community, Blender - a 3D modelling and rendering package, Blender Foundation, Stichting Blender Foundation,...
  • J. Deng et al.

    Imagenet: A large-scale hierarchical image database

    2009 IEEE Conference on Computer Vision and Pattern Recognition

    (2009)
  • I. Goodfellow et al.

    Deep learning

    (2016)
  • Y. Guo et al.

    Deep learning for 3d point clouds: a survey

    IEEE Trans Pattern Anal Mach Intell

    (2019)
  • T. Hackel et al.

    Semantic3d.net: A new large-scale point cloud classification benchmark

    ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences

    (2017)
  • J. Heller et al.

    3d reconstruction from photographs by cmp sfm web service

    2015 14th IAPR International Conference on Machine Vision Applications (MVA)

    (2015)
  • M. Hendrikx et al.

    Procedural content generation for games: a survey

    ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)

    (2013)
  • Cited by (24)

    • Glass façade segmentation and repair for aerial photogrammetric 3D building models with multiple constraints

      2023, International Journal of Applied Earth Observation and Geoinformation
    • Generation of hyperspectral point clouds: Mapping, compression and rendering

      2022, Computers and Graphics (Pergamon)
      Citation Excerpt :

      Likewise, the use of UAV platforms makes easier the capture of target entities from multiple viewpoints. Accordingly, a wide variety of natural and urban scenarios can be efficiently reconstructed for multiple purposes, from forest monitoring [3,20,21] to urban planning [22–25]. In particular, a wide variety of sensors has burst onto the market for capturing the three-dimensional features of natural and urban environments.

    • SUM: A benchmark dataset of Semantic Urban Meshes

      2021, ISPRS Journal of Photogrammetry and Remote Sensing
      Citation Excerpt :

      The leading causes are the Lambertian surface assumption during the image matching and the inadequate image overlapping rate during the flight. Similarly, the Swiss3DCities (Can et al., 2021) was recently released that covers three cities in Zurich but twice smaller than the SensatUrban. The annotation work was conducted on a simplified mesh in the software Blender (B. Foundation, 2002), and then the semantics were transferred to the mesh vertices, which are regarded as point clouds, via the nearest neighbour search.

    • CENAGIS-ALS BENCHMARK - NEW PROPOSAL FOR DENSE ALS BENCHMARK BASED ON THE REVIEW OF DATASETS AND BENCHMARKS FOR 3D POINT CLOUD SEGMENTATION

      2023, International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives
    • A NEW DATASET AND METHODOLOGY FOR URBAN-SCALE 3D POINT CLOUD CLASSIFICATION

      2023, International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives
    View all citing articles on Scopus
    View full text