Original article
Deep learning-based tree species mapping in a highly diverse tropical urban setting

https://doi.org/10.1016/j.ufug.2021.127241Get rights and content

Highlights

  • We show a CNN-based approach to map urban tree species with aerial photographs.

  • We tested our approach in a highly diverse tropical urban setting.

  • We achieved a 79.3 ± 8.6% of F1-score for mapping nine tree species

  • Our approach is helpful to produce urban tree species composition maps.

Abstract

Spatially explicit information on urban tree species distribution is crucial for green infrastructure management in cities. This information is usually acquired with ground-based surveys, which are time-consuming and usually cover limited spatial extents. The combination of machine learning algorithms and remote sensing images has been hailed as a promising way to map tree species over broad areas. Recently, convolutional neural networks (CNNs), a type of deep learning method, have achieved outstanding results for tree species discrimination in various remote sensing data types. However, there is a lack of studies using CNN-based methods to produce tree species composition maps, particularly for tropical urban settings. Here, we propose a multi-task CNN to map tree species in a highly diverse neighborhood in Rio de Janeiro, Brazil. Our network architecture takes an aerial photograph (RGB bands and pixel size = 0.15 m) and delivers two outputs: a semantically segmented image and a distance map transform. In the former, all pixel positions are labeled, while in the latter, each pixel position contains the Euclidean distance to the crown boundary. We developed a post-processing approach that combines the two outputs, and we classified nine and five tree species with an average F1-score of 79.3 ± 8.6% and 87.6 ± 4.4%, respectively. Moreover, our post-processing approach produced a realistic tree species composition map by labeling only pixels of the target species with high class membership probabilities. Our results show the potential of CNNs and aerial photographs to map tree species in highly diverse tropical urban settings, providing valuable insights for urban forest management and green spaces planning.

Introduction

Urban trees provide essential ecosystem services (ES) such as air pollution mitigation, surface temperature reduction, and carbon sequestration for the inner-city population. Moreover, they have been hailed as a nature-based solution to mitigate and adapt to climate change impacts (Hobbie and Grimm, 2020). Besides ES, urban trees also provide some disservices that can negatively affect human well-being (Lyytimäki, 2014). Examples of disservices are allergies caused by pollen emissions, pest outbreaks, and physical damage resulting from tree or branch falling.

Many ecosystem services and disservices of urban trees are provided as a function of species identity (Cariñanos et al., 2017). Information on the spatial distribution of tree species in urban areas may help planners and decision-makers to manage green spaces, maximizing their ability to produce goods and services and reduce disservices. This information is usually acquired with ground-based surveys of individual trees, which is costly and time-consuming. A promising way to acquire spatially explicit information on tree species distribution over large areas is by combining remote sensing images and machine learning methods.

Conventionally, tree species mapping is performed using object-based approaches, also known as Geographic Object-Based Image Analysis (GEOBIA), consisting of two separate steps: individual tree crown (ITC) delineation followed by classification. ITC delineation is not a trivial task. It is performed with image segmentation algorithms that rely on subjective and arbitrary parameters setting. The incorrect choice of parameters may lead to undesired results, such as under-segmentation and over-segmentation, which may impact the classification accuracy. The complexity of ITCs in terms of shapes and sizes challenges the generalization capability of segmentation techniques.

Moreover, the classification step of object-based approaches requires the extraction of species-specific characteristics performed by user-guided feature engineering. Deep learning methods like convolutional neural networks (CNNs) overcome the limitations of GEOBIA by simultaneously detecting and classifying ITCs in an end-to-end fashion, that is, without user intervention.

Tree species discrimination with CNN-based methods can be conducted using four approaches: scene classification, object detection, semantic segmentation, and instance segmentation. Scene classification aims to identify the presence of a particular tree species in an image given as input. Object detection methods build a bounding box encompassing the tree crown, locating it within the input image. Semantic segmentation refers to the process of assigning a label to all pixels of a target tree species in the input image. Instance segmentation aims to label every pixel of an ITC and outline its exact shape, combining object detection and semantic segmentation concepts.

Several studies demonstrated the potential of CNNs to discriminate among tree species in forest environments using RGB images (Schiefer et al., 2020; Onishi and Ise, 2021; Wagner et al., 2020a, 2020b, 2019; Ferreira et al., 2020; Kattenborn et al., 2020, Kattenborn et al., 2019), hyperspectral (Liao et al., 2018; Fricker et al., 2019 Trier et al., 2018; Nezami et al., 2020; Sothe et al., 2020; Miyoshi et al., 2020) and light detection and ranging (LiDAR) data (Weinstein et al., 2020; Guan et al., 2015). In urban areas, most studies focused on temperate regions. Hartling et al. (2019) used a CNN designed for scene classification to discriminate urban tree species in St. Louis, MO, USA. The authors combined satellite images and LiDAR data, achieving an average accuracy of 80.8% for classifying eight tree species. Zhang et al. (2020) compared different CNN architectures for scene classification to identify ten urban tree species in a dataset composed of thousands of tree canopy images. The overall accuracy varied between 84.6% to 92.6%. The works mentioned above did not map the spatial distribution of tree species because they relied on CNNs for scene classification.

The studies performed in tropical urban settings focused on classifying a single species or tree cover in general. Santos et al. (2019) and Lobo Torres et al. (2020) used CNN-based methods for object detection and semantic segmentation, respectively, to detect trees of Dipteryx alata, reporting accuracy values greater than 85%. Wagner and Hirye (2019) used the U-net (Ronneberger et al., 2015) model and aerial photographs (ground sampling distance (GSD) = 1 m) to produce urban tree cover maps of the metropolitan region of São Paulo, Brazil. Timilsina et al. (2019) developed a CNN architecture to perform fine-scale segmentation of tree cover in very-high-resolution (VHR) orthophotos (GSD = 0.15 m) that were acquired over an urban area in Tasmania, Australia. No previous study employed CNNs to automatically discriminate multiple tree species in highly diverse tropical urban settings. Moreover, there is a methodological gap to be filled regarding the production of urban tree species composition maps.

This study proposes a new multi-task CNN to retrieve tree species composition from VHR aerial photographs acquired over urban areas. More specifically, our CNN architecture has task-specific layers to perform semantic segmentation and detection of individual trees. We tested our approach to mapping nine tree species from a highly diverse neighborhood located in Rio de Janeiro, Brazil.

Section snippets

Study area

The study area is located in the city of Rio de Janeiro, Brazil (Fig. 1). It comprises the urbanized domain of the Grajaú neighborhood, an area of about 325 ha. We chose the Grajaú neighborhood because of the high number of trees and diversity of species. For example, according to the forest inventory of Rio de Janeiro (Giácomo, 2018), there are 2391 street trees from 109 species (Table A.1). The study area's mean annual temperature is 23.2 ± 5.5 C, and it receives about 1278 mm of rain

CNN architecture

This work proposes a multi-task fully convolutional network for semantic segmentation and delineation of ITCs in urban regions. The network takes an input image and produces two outputs: a semantically segmented image and a distance map transform. The semantic segmentation output delivers labels to all image pixels, while the distance map gives the distance to the ITC boundary. The architecture consists of a shared encoder network and two task-specific decoder networks, a classification and a

Classification accuracy

Table 2 shows a comparison between the classification accuracy metrics obtained before and after post-processing (Section 3.3) for the nine-species dataset. One can note that the F1-score increased by at least eight percentage points for all species. P. rubra showed the largest improvement in F1-score with 27.1 percentage points. Table 2 shows the variability in the classification accuracy metrics obtained after training and testing the MT-EDv3 architecture (Fig. 3) with different sets of ITCs.

Tree species classification accuracy

We proposed a new approach to map tree species in urban areas using aerial photographs. Our multi-task CNN architecture outputs a labeled image and a regression map for each input image, thus improving the classification accuracy and producing a realistic species map of the study area. Previous studies used a regression branch for building footprint extraction (Bischke et al., 2019). Recently, Rosa et al. (2021) employed a regression branch to improve tree species mapping in a dense tropical

Conclusions

This study proposes a multi-task CNN architecture that performs semantic segmentation and delineation of individual trees using aerial photographs. A DeepLabv3+ inspired architecture performs semantic segmentation with a ResNet backbone, and a regression branch carries out individual tree detection. We developed a post-processing procedure that combines semantic segmentation and regression outputs, and we obtained an average F1-score of 79.3 ± 8.6%. We show that the post-processing procedure

Author contributions

Gabriela Barbosa Martins: Validation, formal analysis, investigation, visualization, writing – original draft preparation, writing – review & editing. Laura Elena Cué La Rosa: Methodology, software, validation, formal analysis, investigation, writing – original draft preparation, writing – review & editing, supervision. Patrick Nigri Happ: Writing – original draft preparation, writing – review & editing. Luiz Carlos Teixeira Coelho Filho: Resources, data curation, writing – original draft

Declaration of competing interest

The authors report no declarations of interest.

Acknowledgements

We gratefully thank the Artificial Intelligence (AI) for Earth grant program of Microsoft Inc. for supporting this work. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan V GPU used for this research. We thank the Instituto Pereira Passos (IPP) for kindly providing the aerial photographs used in this study. M.P. Ferreira, R.Q. Feitosa, and L.E.C.L. Rosa were supported by the Brazilian National Council for Scientific and Technological Development (CNPq)

References (50)

  • P. Cariñanos et al.

    The cost of greening: disservices of urban trees

    The Urban Forest

    (2017)
  • L.C. Chen et al.

    Encoder-decoder with atrous separable convolution for semantic image segmentation

    Proceedings of the European Conference on Computer Vision (ECCV)

    (2018)
  • F. Chollet

    Xception: deep learning with depthwise separable convolutions

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • G.A. Fricker et al.

    A convolutional neural network classifier identifies tree species in mixed-conifer forest from hyperspectral imagery

    Remote Sens.

    (2019)
  • R.G. Giácomo

    Arborio: Sistema de gestão da arborização urbana

    II Seminário sobre o Sistema Municipal de Informações Urbanas

    (2018)
  • H. Guan et al.

    Deep learning-based tree classification using mobile LiDAR data

    Remote Sens. Lett.

    (2015)
  • S. Hartling et al.

    Urban tree species classification using a WorldView-2/3 and LiDAR data fusion approach and deep learning

    Sensors

    (2019)
  • K. He et al.

    Deep residual learning for image recognition

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2016)
  • K. He et al.

    Deep residual learning for image recognition

    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2016)
  • S.E. Hobbie et al.

    Nature-based approaches to managing climate change impacts in cities

    Philos. Trans. R. Soc. B

    (2020)
  • A. Howard et al.

    Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation

    (2018)
  • G. Huang et al.

    Densely connected convolutional networks

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • T. Kattenborn et al.

    Convolutional neural networks enable efficient, accurate and fine-grained segmentation of plant species and communities from high-resolution UAV imagery

    Sci. Rep.

    (2019)
  • T. Kattenborn et al.

    Convolutional neural networks accurately predict cover fractions of plant species and communities in unmanned aerial vehicle imagery

    Remote Sens. Ecol. Conserv.

    (2020)
  • P. Krähenbühl et al.

    Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

    (2012)
  • Cited by (26)

    View all citing articles on Scopus
    View full text