Parsing very high resolution urban scene images by learning deep ConvNets with edge-aware loss

doi:10.1016/j.isprsjprs.2020.09.019

ISPRS Journal of Photogrammetry and Remote Sensing

Volume 170, December 2020, Pages 15-28

https://doi.org/10.1016/j.isprsjprs.2020.09.019 Get rights and content

Abstract

Parsing very high resolution (VHR) urban scene images into regions with semantic meaning, e.g. buildings and cars, is a fundamental task in urban scene understanding. However, due to the huge quantity of details contained in an image and the large variations of objects in scale and appearance, the existing semantic segmentation methods often break one object into pieces, or confuse adjacent objects and thus fail to depict these objects consistently. To address these issues uniformly, we propose a standalone end-to-end edge-aware neural network (EaNet) for urban scene semantic segmentation. For semantic consistency preservation inside objects, the EaNet model incorporates a large kernel pyramid pooling (LKPP) module to capture rich multi-scale context with strong continuous feature relations. To effectively separate confusing objects with sharp contours, a Dice-based edge-aware loss function (EA loss) is devised to guide the EaNet to refine both the pixel- and image-level edge information directly from semantic segmentation prediction. In the proposed EaNet model, the LKPP and the EA loss couple to enable comprehensive feature learning across an entire semantic object. Extensive experiments on three challenging datasets demonstrate that our method can be readily generalized to multi-scale ground/aerial urban scene images, achieving 81.7% in mIoU on Cityscapes Test set and 90.8% in the mean F1-score on the ISPRS Vaihingen 2D Test set. Code is available at: https://github.com/geovsion/EaNet.

Introduction

Semantic segmentation of urban scene images aims to locate objects at the pixel-level and assign them with categorical labels, which supports a wide range of urban applications, such as urban mapping and 3D modeling, autonomous driving, urban land cover classification and change detection (Zhu et al., 2017, Marcos et al., 2018, Zhao et al., 2018a). However, as a dense pixel-wise classification task, semantic image segmentation faces big challenges in urban areas, due to the volume of detailed information contained in very high resolution (VHR) images and the large variations in the scale and appearance of objects. Large numbers of image details hamper the extraction of features relevant to global structure and semantic information of urban objects. Meanwhile, objects with large-scale variation frequently found in an image, such as large buildings and small cars, create difficulties when balancing the segmentation quality of images containing diverse kinds of objects varying in size. Moreover, the existence of many confusing categories, like trees and meadows, or similar objects with diverse appearances like cars, makes it hard to realize intra-class unification and inter-class discrimination simultaneously, when parsing urban scenes.

Extensive investigations have been presented for the challenging urban scene parsing task based on convolutional neural networks (ConvNets) (Yang et al., 2018, Yu et al., 2018a, Zhao et al., 2018b), due to the ability of ConvNets in hierarchical features learning and rich context capturing (Chen et al., 2018). In particular, ConvNets based on fully convolutional neural network (FCN) have become the mainstream approach for urban scene parsing with the success of the first end-to-end FCN for semantic segmentation (Long et al. 2015). However, the powerful ConvNets capability for abstraction in data-driven learning tasks creates two technical hurdles: imbalanced attention to multi-scale objects and loss of detail during encoding. Targeting these two issues, much effort has been devoted to the improvement of semantic segmentation (Liu et al., 2018a, Yu et al., 2018b).

In urban scene semantic segmentation, when there is scale variance in the objects found in an image, a neural network with an inappropriate receptive field size will give unbalanced attention to differently sized objects. A neural network with small view field will pay more attention to small things and divide the larger objects into fragments, while one with a large view field will ignore details and fail to separate small adjacent objects. Common solutions for multi-scale object segmentation focus on receptive field enlargement (Chen et al., 2018b, Zhao et al., 2017). Many methods were developed with image pyramids (Zhao et al., 2018) or extra subnetworks (Yang et al. 2018), but such methods are time-consuming. A more popular way is to deploy a spatial pyramid pooling (SPP) module in the network architecture (Chen et al., 2018b, Yuan and Wang, 2018, He et al., 2019). However, the current SPPs have difficulty in capturing relational information between long-range features while retaining continuous between neighboring features (Wang et al. 2018), due to inappropriate receptive field size design. Thus, when balancing segmentation quality of multi-scale urban objects, large objects still tend to be divided into fragments.

Another inevitable problem in urban scene semantic segmentation with ConvNets is detail degradation caused by downsampling. Detail degradation affects the accurate localization of objects at the pixel level, leading to blurry object boundaries. To tackle this problem, numerous methods have concentrated on enhancing the sensitivity of a model to boundary information. One way is to employ post-processing techniques such as a conditional random field (CRF) (Paisitkriangkrai et al., 2015, Sherrah, 2016, Chen et al., 2018b), which comes with high computational costs. The other relies on applying an extra edge extraction sub network (Cheng et al., 2017, Liu et al., 2018b) or even an individual edge detection model like HED (Xie and Tu, 2015, Marmanis et al., 2018) to merge boundary information during segmentation. However, employing extra edge detectors will increase model complexity and require more training parameters. Moreover, the edge detectors used in these methods only learn edge features with pixel-level cross entropy loss (CE loss) and is independent to the semantic feature learning of an object, which lead to an incomprehensive learning across an entire object.

In this paper, we propose an edge-aware neural network (EaNet) for precise semantic segmentation of urban scenes. For the basic architecture of EaNet, we deploy a balanced encoder and decoder structure with skip pathways. To address the aforementioned two issues in a unified framework, we appended a couple of modules, i.e., large kernel pyramid pooling (LKPP) and Dice-based edge-aware loss function (EA loss) on the top of the encoder and the decoder of EaNet, respectively. The LKPP captures rich context information at multiple scales and builds strong continuous relations between long-range and neighboring features, by constructing several branches with different densely extending receptive field sizes. It effectively strengthens semantic unification inside objects to prevent them from being segmented into fragments. Moreover, the EA loss optimizes segmentation predictions via a standard cross entropy loss, and learns edge information directly from the segmentation prediction map using Dice-based edge loss. In this way, the EA loss module can work at both pixel- and image-level with no extra training parameters, which is more efficient and effective than many existing solutions for object boundary learning. By integrating the LKPP and the EA loss in a single one-stream EaNet model, the two modules can directly communicate through the forward and backward propagation, which enables a more comprehensive learning of semantic objects than many existing methods.

The EaNet is standalone and elegant, and has high generalization ability even with very large-scale urban scene data. Moreover, the two proposed modules, LKPP and EA loss, can be easily applied to other FCN frameworks. The main contributions of this work are follows:

•
We propose a simple yet effective edge-aware neural network (EaNet) for a comprehensive learning of semantic objects.
•
A LKPP module is proposed to densely capture multi-scale rich context with strongly continuous feature relations, and thus robustly segments multi-scale urban objects with high intra-class consistency.
•
The EA loss module refines object boundaries directly from segmentation prediction at both pixel- and image-level, which significantly improves discrimination of confusing urban objects. The module provides a new loss function for simultaneous semantic category and edge structure learning, which is superior to existing combined solutions.
•
We validate the proposed EaNet on three datasets with very different characteristics, i.e., Cityscapes, ISPRS Vaihingen 2D and WHU Aerial Buildings. We show that EaNet can be highly generalized to multi-scale ground/aerial urban scene data, achieving competitive performance.

The rest of this paper is organized as follows. Related work is reviewed in Section 2. The architecture of EaNet and its components are detailed in Section3. The performance of the two general modules and the complete EaNet is evaluated in Section 4. Some conclusions are drawn in Section 5.

Section snippets

Related work

Extensive works have been presented on urban scene semantic segmentation employing ConvNets, both in the field of computer vision and remote sensing (Zhu et al., 2017, Chen et al., 2018b). In this section, we briefly review the works most relevant to the two technical hurdles in urban scene parsing, i.e., imbalanced attention to multi-scale objects and loss of boundary detail during encoding.

Architecture of the proposed EaNet

In this section, we discuss the architecture of the proposed EaNet and its two major components in detail, starting with an overview of the EaNet workflow in general.

Experiments

We conducted experiments on three datasets, including a large-scale ground dataset, i.e., Cityscapes (Cordts et al. 2016), and two relatively small-scale aerial datasets, i.e., ISPRS Vaihingen 2D (Gerke 2014) and the WHU Aerial Building Dataset (Ji et al. 2018), in order to comprehensively test the learning capacity and generalizability of the proposed EaNet model. Ablation studies were conducted for the two general modules, i.e. LKPP and EA loss individually, to verify their efficacy when

Conclusion

In this paper, we propose an edge-aware neural network (EaNet) with large kernel pyramid pooling for robust semantic segmentation in urban areas. Extensive ablation experiments show that the proposed EaNet can adapt to both ground and aerial urban scene images, and achieved excellent performance consistently on three benchmark datasets, i.e., Cityscapes, ISPRS Vaihingen, and the WHU Aerial Building datasets. Qualitative and quantitative analysis results verify that the two introduced modules,

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research was funded by the National Key Research and Development Program of China under Grant 2018Y- FB0505401, the National Natural Science Foundation of China Project under Grant 41701445, 41871361, 42071370 and the Fundamental Research Funds for the Central Universities.

References (58)

N. Audebert et al.
Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks
(2018)
Y. Liu et al.
Semantic labeling in very high resolution images via a self-cascaded convolutional neural network
ISPRS J. Photogramm. Remote Sens.
(2018)
D. Marcos et al.
Land cover mapping at very high resolution with rotation equivariant CNNs: Towards small yet accurate models
ISPRS J. Photogramm. Remote Sens.
(2018)
D. Marmanis et al.
Classification with an edge: improving semantic image segmentation with boundary detection
ISPRS J. Photogrammetry Remote Sens.
(2018)
Y. Sun et al.
Problems of encoder-decoder frameworks for high-resolution remote sensing image segmentation: Structural stereotype and insufficient learning
Neurocomputing
(2019)
H. Wang et al.
Gated convolutional neural network for semantic segmentation in high-resolution images
Remote Sensing
(2017)
N. Audebert et al.
Semantic segmentation of earth observation data using multimodal and multi-scale deep networks
(2016)
L. Chen et al.
Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs
IEEE Trans. Pattern Anal. Mach. Intell.
(2018)
L. Chen et al.
Encoder-decoder with atrous separable convolution for semantic image segmentation
Proceedings of the European Conference on Computer Vision (ECCV)
(2018)
Chen L., G. Papandreou, F. Schroff and H. Adam, 2017. Rethinking Atrous Convolution for Semantic Image...

L. Chen et al.

Attention to scale: scale-aware semantic image segmentation

IEEE Conference Computer Vision Pattern Recognition

(2016)

D. Cheng et al.

FusionNet: Edge aware deep convolutional networks for semantic segmentation of remote sensing harbor images

IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.

(2017)

M. Cordts et al.

The cityscapes dataset for semantic urban scene understanding

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(2016)

H. Ding et al.

Boundary-aware feature propagation for scene segmentation

Proceedings of the IEEE International Conference on Computer Vision (ICCV)

(2019)

X. Ding et al.

ACNet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks

Proceedings of the IEEE International Conference on Computer Vision (ICCV)

(2019)

M. Gerke

Use of the stair vision library within the ISPRS 2D semantic labeling benchmark (Vaihingen)

Technical Report

(2014)

S. Ghassemi et al.

Learning and adapting robust features for satellite image segmentation on heterogeneous data sets

IEEE Trans. Geosci. Remote Sens.

(2019)

J. He et al.

Adaptive pyramid context network for semantic segmentation

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(2019)

K. He et al.

Mask r-cnn Proceedings of the IEEE International Conference on Computer Vision (ICCV)

(2017)

K. He et al.

Deep residual learning for image recognition

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(2016)

S. Ji et al.

Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set

IEEE Trans. Geosci. Remote Sens.

(2018)

J. Jiang et al.

Incorporating depth into both CNN and CRF for indoor semantic segmentation

2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS)

(2017)

W. Kang et al.

EU-Net: an efficient fully convolutional network for building extraction from optical remote sensing images

Remote Sensing

(2019)

P. Krähenbühl et al.

Efficient inference in fully connected CRFs with gaussian edge potentials

Advances in Neural Information Processing Systems

(2011)

G. Lin et al.

Refinenet: Multi-path refinement networks for high-resolution semantic segmentation

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(2017)

H. Liu et al.

DE-net: deep encoding network for building extraction from high-resolution remote sensing imagery

Remote Sensing

(2019)

Q. Liu et al.

Dense dilated convolutions' merging network for land cover classification

IEEE Trans. Geosci. Remote Sens.

(2020)

S. Liu et al.

ERN: edge loss reinforced semantic segmentation network for remote sensing images

Remote Sensing

(2018)

Z. Liu et al.

Semantic image segmentation via deep parsing network

IEEE International Conference on Computer Vision (ICCV)

(2016)

Cited by (93)

Accurate contour preservation for semantic segmentation by mitigating the impact of pseudo-boundaries
2024, International Journal of Applied Earth Observation and Geoinformation
Accurately extracting the contours of ground objects has been an important research topic in the field of semantic segmentation of remote sensing imagery. However, existing efforts have primarily focused on refining the boundaries of predictive masks, with little consideration given to pseudo boundaries caused by abrupt changes in surface textures. Therefore, this paper addresses this challenge with the contour preservation network (CPNet), a novel semantic segmentation network that effectively mitigates pseudo-boundary effects and produces more precise contours. The key of CPNet is the boundary-guided feature alignment module (BGAM). This module employs supervised boundary guidance to adaptively transfer the model’s attention from salient areas to correct semantic boundaries. This adaptive attention transfer mechanism enables the model to suppress the impact of internal pseudo boundaries and refine contours. To further refine boundaries, a boundary point feature rectification module (ReBPM) is designed to rectify the classification of boundary points with neighbor features. Extensive experimental validations have demonstrated the effectiveness and flexibility of the proposed CPNet on ISPRS Potsdam and Vaihingen datasets. The results showed that our model outperforms other state-of-the-art methods in terms of boundary IoU, mean IoU, and mean F1-score, and it exhibits significantly superior contour preservation ability compared to other models, notably in the presence of pseudo-boundaries. The code is available at: https://github.com/angiecao/CPNet.
Blurry dense object extraction based on buffer parsing network for high-resolution satellite remote sensing imagery
2024, ISPRS Journal of Photogrammetry and Remote Sensing
Despite the remarkable progress of deep learning-based object extraction in revealing the number and boundary location of geo-objects for high-resolution satellite imagery, it still faces challenges in accurately extracting blurry dense objects. Unlike general objects, blurry dense objects have limited spatial resolution, leading to inaccurate and connected boundaries. Even with the improved spatial resolution and recent boundary refinement methods for general object extraction, connected boundaries may remain undetected in blurry dense object extraction if the gap between object boundaries is less than the spatial resolution. This paper proposes a blurry dense object extraction method named the buffer parsing network (BPNet) for satellite imagery. To solve the connected boundary problem, a buffer parsing module is designed for dense boundary separation. Its essential component is a buffer parsing architecture that comprises a boundary buffer generator and an interior/boundary parsing step. This architecture is instantiated as a dual-task mutual learning head that co-learns the mutual information between the interior and boundary buffer, which estimates the dependence between the dual-task outputs. Specifically, the boundary buffer head generates a buffer region that overlaps with the interior, enabling the architecture to learn the dual-task bias and assign a reliable semantic in the overlapping region through high-confidence voting. To alleviate the inaccurate boundary location problem, BPNet incorporates a high-frequency refinement module for blurry boundary refinement. This module includes a high-frequency enhancement unit to enhance high-frequency signals at the blurry boundaries and a cascade buffer parsing refinement unit that integrates the buffer parsing architecture coarse-to-fine to recover the boundary details progressively. The proposed BPNet framework is validated on two representative blurry dense object datasets for small vehicle and agricultural greenhouse object extraction. The results indicate the superior performance of the BPNet framework, achieving 25.25% and 73.51% in contrast to the state-of-the-art PointRend method, which scored 21.92% and 63.95% in the ${A P 50}_{s e g m}$ metric on two datasets, respectively. Furthermore, the ablation analysis of the super-resolution and building extraction methods demonstrates the significance of high-quality boundary details for subsequent practical applications, such as building vectorization. The code is available at: https://github.com/Dingyuan-Chen/BPNet.
OSLPNet: A neural network model for street lamp post extraction from street view imagery
2023, Expert Systems with Applications
Quickly and accurately obtaining street lamp post information has great application value in smart city construction and automatic vehicle navigation. However, the existing deep learning methods are affected by factors such as the perspective effect, different objects with the same spectrum, and occlusion. There can also be some problems in the semantic segmentation results for street lamp posts, such as under-segmentation, misextraction, and discontinuity. In this paper, we present the OSLPNet model for the extraction of street lamp posts from street view imagery. According to the characteristics of the various scales of street lamp posts in the imagery, a multi-scale phased controller (MPC) with multi-level receptive fields is proposed to reduce the under-segmentation problem for street lamp posts. According to the unique “elbow” structure of street lamp posts, deformable convolution is introduced to reduce the problem of misextraction of street lamp posts. According to the topological relationship of street lamp post context, a lightweight spatial context (LSC) module is proposed to solve the problem of discontinuous detection of street lamp posts caused by occlusion. We also proposed two street lamp pole datasets, and experimental results showed that our F1 values can reach 85.2% and 82.4% under both datasets, which is superior to the existing state of art method. The code and datasets are publicly available at https://github.com/ZzzTD/OSLPNet.
P-Swin: Parallel Swin transformer multi-scale semantic segmentation network for land cover classification
2023, Computers and Geosciences
With the recent development of remote sensing technology and deep learning, semantic segmentation methods have been increasingly used in land cover classification. However, this method is faced with the challenge of incomplete recognition caused by big differences in scale of ground objects. Owing to multi-head self-attention, the Swin Transformer Network (Swin) has a large receptive field at its shallow level, which is conducive to the identification of large-scale objects. However, Swin does not fully mine the context information of features, so it is easy to cause incomplete recognition. Based on Swin, we propose a parallel window-based Transformer Network, Parallel Swin Transformer Network (P-Swin). The core of P-Swin is a Parallel Swin Transformer Block (PST Block), which includes Window-based Self Attention Interaction (WSAI) and Feed Forward Network (FFN). WSAI can not only calculate the relationship within windows, but also establish the relationship between windows. Therefore, it improves the ability of network to obtain feature context information. P-Swin outperformed Swin and reached the highest level, with 76.42% mIoU for the test set in the ISPRS Potsdam 2D dataset (Swin: 75.95%), 65.13% mIoU for the test set in the Gaofen Image Dataset (Swin: 63.41%), and 64.61% mIoU for the test set in the WHDLD Dataset (Swin: 63.01%)
Aligning semantic distribution in fusing optical and SAR images for land use classification
2023, ISPRS Journal of Photogrammetry and Remote Sensing
Optical and synthetic aperture radar (SAR) images, two standard Earth observation tools, can reflect the characteristics of the surface from different perspectives and provide complementary information for land use classification. However, because they belong to different modes and express land objects differently, it is challenging to effectively fuse and use them to perform a pixel-wise classification. Current methods only focus on the local receptive field to fuse deep features in a single dimension, which is too simple to fully exploit the correlation between different modes. Moreover, the appearance disparities between each modality may induce semantic misalignment and disrupt the conditions of features’ fusion. To overcome the above problems, we introduce a spatial-aware circular module to generate a cross-modality receptive field and globally enhance the interaction between each pixel in the spatial dimension. Additionally, we recalibrate the features in the channel dimension to selectively refine and retain the essential things during the process, which can further achieve feature refinement. To reduce the impact of modal appearance disparities, we transform their high-level features into a common latent space and align their distributions to correlate the complementary cues hidden in each modality. The experimental results for the WHU-OPT-SAR dataset show that our method performed better than other state-of-the-art methods, with a mean intersection over union (mIoU) of 58.5% and an overall accuracy (OA) of 84.2%. Furthermore, the method obtained competitive results in Ezhou and Panjin, China. The results demonstrate our method’s applicability.
Cross-sensor remote sensing imagery super-resolution via an edge-guided attention-based network
2023, ISPRS Journal of Photogrammetry and Remote Sensing
The deep learning based super-resolution (SR) methods have recently achieved remarkable progress in the reconstruction of ideally simulated high-quality remote sensing image datasets. However, due to the large variation in image quality caused by the complex degradation factors, their performance decreases dramatically on real-world images acquired by different satellite sensors. To this end, we propose a cross-sensor SR framework that consists of a cross-sensor degradation modeling strategy for bridging the gap between the images obtained by the source and target sensors, and an edge-guided attention-based SR (EGASR) network to promote the learning of high-frequency feature representation. Specifically, we build a degradation pool on the low-resolution (LR) target sensor to produce a degraded training dataset simulated from the high-resolution (HR) images obtained by the source sensor. Furthermore, the EGASR network, which employs the edge-guided residual attention block (EGRAB) to introduce implicit edge prior to enhance edge-related information, is embedded in the cross-sensor SR framework for reconstructing HR results with sharp details. The proposed method is applied on images from the Chinese Gaofen (GF) satellite sensors and compared to several representative SR methods. An ideally simulated GF-2 LR/HR image set with only downsampling considered is first used to evaluate the effectiveness of the proposed EGASR network. Moreover, GF-2/GF-1 and GF-2/GF-6 cross-sensor SR datasets are constructed by synthesizing GF-2 degraded image pairs with the degradation pools estimated from the GF-1 and GF-6 images, respectively. The results show that: 1) the proposed EGASR model shows superiority in reconstructing textural details and edge features, and achieves the best results among the state-of-art SR methods involved in comparison; 2) the cross-sensor SR framework significantly promotes the model’s ability to super-resolve real-world LR images acquired by the target satellite sensors, e.g., the NIQE values are improved by at least 30% and 34% on average with respect to other comparative methods for GF-2/GF-1 and GF-2/GF-6 datasets in the real experiments, respectively. Code is available at https://github.com/zhonghangqiu/EGASR.

View all citing articles on Scopus

View full text

Parsing very high resolution urban scene images by learning deep ConvNets with edge-aware loss

Abstract

Introduction

Section snippets

Related work

Architecture of the proposed EaNet

Experiments

Conclusion

Declaration of Competing Interest

Acknowledgments

Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks

ISPRS J. Photogramm. Remote Sens.

ISPRS J. Photogramm. Remote Sens.

ISPRS J. Photogrammetry Remote Sens.

Neurocomputing

Remote Sensing

Semantic segmentation of earth observation data using multimodal and multi-scale deep networks

Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs

IEEE Trans. Pattern Anal. Mach. Intell.

Encoder-decoder with atrous separable convolution for semantic image segmentation

Proceedings of the European Conference on Computer Vision (ECCV)

Attention to scale: scale-aware semantic image segmentation

IEEE Conference Computer Vision Pattern Recognition

FusionNet: Edge aware deep convolutional networks for semantic segmentation of remote sensing harbor images

IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.

The cityscapes dataset for semantic urban scene understanding

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Boundary-aware feature propagation for scene segmentation

Proceedings of the IEEE International Conference on Computer Vision (ICCV)

ACNet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks

Proceedings of the IEEE International Conference on Computer Vision (ICCV)

Use of the stair vision library within the ISPRS 2D semantic labeling benchmark (Vaihingen)

Technical Report

Learning and adapting robust features for satellite image segmentation on heterogeneous data sets

IEEE Trans. Geosci. Remote Sens.

Adaptive pyramid context network for semantic segmentation

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Mask r-cnn Proceedings of the IEEE International Conference on Computer Vision (ICCV)

Deep residual learning for image recognition

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set

IEEE Trans. Geosci. Remote Sens.

Incorporating depth into both CNN and CRF for indoor semantic segmentation

2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS)

EU-Net: an efficient fully convolutional network for building extraction from optical remote sensing images

Remote Sensing

Efficient inference in fully connected CRFs with gaussian edge potentials

Advances in Neural Information Processing Systems

Refinenet: Multi-path refinement networks for high-resolution semantic segmentation

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

DE-net: deep encoding network for building extraction from high-resolution remote sensing imagery

Remote Sensing

Dense dilated convolutions' merging network for land cover classification

IEEE Trans. Geosci. Remote Sens.

ERN: edge loss reinforced semantic segmentation network for remote sensing images

Remote Sensing

Semantic image segmentation via deep parsing network

IEEE International Conference on Computer Vision (ICCV)