Original articleDeep learning-based tree species mapping in a highly diverse tropical urban setting
Introduction
Urban trees provide essential ecosystem services (ES) such as air pollution mitigation, surface temperature reduction, and carbon sequestration for the inner-city population. Moreover, they have been hailed as a nature-based solution to mitigate and adapt to climate change impacts (Hobbie and Grimm, 2020). Besides ES, urban trees also provide some disservices that can negatively affect human well-being (Lyytimäki, 2014). Examples of disservices are allergies caused by pollen emissions, pest outbreaks, and physical damage resulting from tree or branch falling.
Many ecosystem services and disservices of urban trees are provided as a function of species identity (Cariñanos et al., 2017). Information on the spatial distribution of tree species in urban areas may help planners and decision-makers to manage green spaces, maximizing their ability to produce goods and services and reduce disservices. This information is usually acquired with ground-based surveys of individual trees, which is costly and time-consuming. A promising way to acquire spatially explicit information on tree species distribution over large areas is by combining remote sensing images and machine learning methods.
Conventionally, tree species mapping is performed using object-based approaches, also known as Geographic Object-Based Image Analysis (GEOBIA), consisting of two separate steps: individual tree crown (ITC) delineation followed by classification. ITC delineation is not a trivial task. It is performed with image segmentation algorithms that rely on subjective and arbitrary parameters setting. The incorrect choice of parameters may lead to undesired results, such as under-segmentation and over-segmentation, which may impact the classification accuracy. The complexity of ITCs in terms of shapes and sizes challenges the generalization capability of segmentation techniques.
Moreover, the classification step of object-based approaches requires the extraction of species-specific characteristics performed by user-guided feature engineering. Deep learning methods like convolutional neural networks (CNNs) overcome the limitations of GEOBIA by simultaneously detecting and classifying ITCs in an end-to-end fashion, that is, without user intervention.
Tree species discrimination with CNN-based methods can be conducted using four approaches: scene classification, object detection, semantic segmentation, and instance segmentation. Scene classification aims to identify the presence of a particular tree species in an image given as input. Object detection methods build a bounding box encompassing the tree crown, locating it within the input image. Semantic segmentation refers to the process of assigning a label to all pixels of a target tree species in the input image. Instance segmentation aims to label every pixel of an ITC and outline its exact shape, combining object detection and semantic segmentation concepts.
Several studies demonstrated the potential of CNNs to discriminate among tree species in forest environments using RGB images (Schiefer et al., 2020; Onishi and Ise, 2021; Wagner et al., 2020a, 2020b, 2019; Ferreira et al., 2020; Kattenborn et al., 2020, Kattenborn et al., 2019), hyperspectral (Liao et al., 2018; Fricker et al., 2019 Trier et al., 2018; Nezami et al., 2020; Sothe et al., 2020; Miyoshi et al., 2020) and light detection and ranging (LiDAR) data (Weinstein et al., 2020; Guan et al., 2015). In urban areas, most studies focused on temperate regions. Hartling et al. (2019) used a CNN designed for scene classification to discriminate urban tree species in St. Louis, MO, USA. The authors combined satellite images and LiDAR data, achieving an average accuracy of 80.8% for classifying eight tree species. Zhang et al. (2020) compared different CNN architectures for scene classification to identify ten urban tree species in a dataset composed of thousands of tree canopy images. The overall accuracy varied between 84.6% to 92.6%. The works mentioned above did not map the spatial distribution of tree species because they relied on CNNs for scene classification.
The studies performed in tropical urban settings focused on classifying a single species or tree cover in general. Santos et al. (2019) and Lobo Torres et al. (2020) used CNN-based methods for object detection and semantic segmentation, respectively, to detect trees of Dipteryx alata, reporting accuracy values greater than 85%. Wagner and Hirye (2019) used the U-net (Ronneberger et al., 2015) model and aerial photographs (ground sampling distance (GSD) = 1 m) to produce urban tree cover maps of the metropolitan region of São Paulo, Brazil. Timilsina et al. (2019) developed a CNN architecture to perform fine-scale segmentation of tree cover in very-high-resolution (VHR) orthophotos (GSD = 0.15 m) that were acquired over an urban area in Tasmania, Australia. No previous study employed CNNs to automatically discriminate multiple tree species in highly diverse tropical urban settings. Moreover, there is a methodological gap to be filled regarding the production of urban tree species composition maps.
This study proposes a new multi-task CNN to retrieve tree species composition from VHR aerial photographs acquired over urban areas. More specifically, our CNN architecture has task-specific layers to perform semantic segmentation and detection of individual trees. We tested our approach to mapping nine tree species from a highly diverse neighborhood located in Rio de Janeiro, Brazil.
Section snippets
Study area
The study area is located in the city of Rio de Janeiro, Brazil (Fig. 1). It comprises the urbanized domain of the Grajaú neighborhood, an area of about 325 ha. We chose the Grajaú neighborhood because of the high number of trees and diversity of species. For example, according to the forest inventory of Rio de Janeiro (Giácomo, 2018), there are 2391 street trees from 109 species (Table A.1). The study area's mean annual temperature is 23.2 ± 5.5 ∘C, and it receives about 1278 mm of rain
CNN architecture
This work proposes a multi-task fully convolutional network for semantic segmentation and delineation of ITCs in urban regions. The network takes an input image and produces two outputs: a semantically segmented image and a distance map transform. The semantic segmentation output delivers labels to all image pixels, while the distance map gives the distance to the ITC boundary. The architecture consists of a shared encoder network and two task-specific decoder networks, a classification and a
Classification accuracy
Table 2 shows a comparison between the classification accuracy metrics obtained before and after post-processing (Section 3.3) for the nine-species dataset. One can note that the F1-score increased by at least eight percentage points for all species. P. rubra showed the largest improvement in F1-score with 27.1 percentage points. Table 2 shows the variability in the classification accuracy metrics obtained after training and testing the MT-EDv3 architecture (Fig. 3) with different sets of ITCs.
Tree species classification accuracy
We proposed a new approach to map tree species in urban areas using aerial photographs. Our multi-task CNN architecture outputs a labeled image and a regression map for each input image, thus improving the classification accuracy and producing a realistic species map of the study area. Previous studies used a regression branch for building footprint extraction (Bischke et al., 2019). Recently, Rosa et al. (2021) employed a regression branch to improve tree species mapping in a dense tropical
Conclusions
This study proposes a multi-task CNN architecture that performs semantic segmentation and delineation of individual trees using aerial photographs. A DeepLabv3+ inspired architecture performs semantic segmentation with a ResNet backbone, and a regression branch carries out individual tree detection. We developed a post-processing procedure that combines semantic segmentation and regression outputs, and we obtained an average F1-score of 79.3 ± 8.6%. We show that the post-processing procedure
Author contributions
Gabriela Barbosa Martins: Validation, formal analysis, investigation, visualization, writing – original draft preparation, writing – review & editing. Laura Elena Cué La Rosa: Methodology, software, validation, formal analysis, investigation, writing – original draft preparation, writing – review & editing, supervision. Patrick Nigri Happ: Writing – original draft preparation, writing – review & editing. Luiz Carlos Teixeira Coelho Filho: Resources, data curation, writing – original draft
Declaration of competing interest
The authors report no declarations of interest.
Acknowledgements
We gratefully thank the Artificial Intelligence (AI) for Earth grant program of Microsoft Inc. for supporting this work. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan V GPU used for this research. We thank the Instituto Pereira Passos (IPP) for kindly providing the aerial photographs used in this study. M.P. Ferreira, R.Q. Feitosa, and L.E.C.L. Rosa were supported by the Brazilian National Council for Scientific and Technological Development (CNPq)
References (50)
- et al.
ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data
ISPRS J. Photogramm. Remote Sens.
(2020) - et al.
Individual tree detection and species classification of Amazonian palms using UAV images and deep learning
For. Ecol. Manage.
(2020) - et al.
Tree species classification in tropical forests using visible to shortwave infrared WorldView-3 images and texture analysis
ISPRS J. Photogramm. Remote Sens.
(2019) - et al.
Mapping tree species in tropical seasonal semi-deciduous forests with hyperspectral and multispectral data
Remote Sens. Environ.
(2016) Bad nature: newspaper representations of ecosystem disservices
Urban For. Urban Green.
(2014)- et al.
Growth patterns and effects of urban micro-climate on two physiologically contrasting urban tree species
Landsc. Urban Plan.
(2019) - et al.
Mapping forest tree species in high resolution UAV-based RGB-imagery by means of convolutional neural networks
ISPRS J. Photogramm. Remote Sens.
(2020) - et al.
Individual tree crown delineation in a highly diverse tropical forest using very high resolution satellite images
ISPRS J. Photogramm. Remote Sens.
(2018) - et al.
Cross-site learning in deep learning RGB tree crown detection
Ecol. Inform.
(2020) - et al.
Multi-task learning for segmentation of building footprints with deep neural networks
2019 IEEE International Conference on Image Processing (ICIP)
(2019)
The cost of greening: disservices of urban trees
The Urban Forest
Encoder-decoder with atrous separable convolution for semantic image segmentation
Proceedings of the European Conference on Computer Vision (ECCV)
Xception: deep learning with depthwise separable convolutions
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
A convolutional neural network classifier identifies tree species in mixed-conifer forest from hyperspectral imagery
Remote Sens.
Arborio: Sistema de gestão da arborização urbana
II Seminário sobre o Sistema Municipal de Informações Urbanas
Deep learning-based tree classification using mobile LiDAR data
Remote Sens. Lett.
Urban tree species classification using a WorldView-2/3 and LiDAR data fusion approach and deep learning
Sensors
Deep residual learning for image recognition
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Deep residual learning for image recognition
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Nature-based approaches to managing climate change impacts in cities
Philos. Trans. R. Soc. B
Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation
Densely connected convolutional networks
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Convolutional neural networks enable efficient, accurate and fine-grained segmentation of plant species and communities from high-resolution UAV imagery
Sci. Rep.
Convolutional neural networks accurately predict cover fractions of plant species and communities in unmanned aerial vehicle imagery
Remote Sens. Ecol. Conserv.
Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
Cited by (26)
Improving urban tree species classification by deep-learning based fusion of digital aerial images and LiDAR
2024, Urban Forestry and Urban GreeningTwo-step carbon storage estimation in urban human settlements using airborne LiDAR and Sentinel-2 data based on machine learning
2024, Urban Forestry and Urban GreeningAssessing the macro-scale patterns of urban tree canopy cover in Brazil using high-resolution remote sensing images
2024, Sustainable Cities and SocietyMerging multiple sensing platforms and deep learning empowers individual tree mapping and species detection at the city scale
2023, ISPRS Journal of Photogrammetry and Remote SensingMachine learning and remote sensing integration for leveraging urban sustainability: A review and framework
2023, Sustainable Cities and SocietyNationwide urban tree canopy mapping and coverage assessment in Brazil from high-resolution remote sensing images using deep learning
2023, ISPRS Journal of Photogrammetry and Remote Sensing