Skip to main content
Log in

CT-UNet: Context-Transfer-UNet for Building Segmentation in Remote Sensing Images

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

With the proliferation of remote sensing images, how to segment buildings more accurately in remote sensing images is a critical challenge. First, most networks have poor recognition ability on high resolution images, resulting in blurred boundaries in the segmented building maps. Second, the similarity between buildings and background results in intra-class inconsistency. To address these two problems, we propose an UNet-based network named Context-Transfer-UNet (CT-UNet). Specifically, we design Dense Boundary Block. Dense Block utilizes reuse mechanism to refine features and increase recognition capabilities. Boundary Block introduces the low-level spatial information to solve the fuzzy boundary problem. Then, to handle intra-class inconsistency, we construct Spatial Channel Attention Block. It combines context space information and selects more distinguishable features from space and channel. Finally, we propose an improved loss function to enhance the purpose of loss by adding evaluation indicator. Based on our proposed CT-UNet, we achieve 85.33% mean IoU on the Inria dataset, 91.00% mean IoU on the WHU dataset and 83.92% F1-score on the Massachusetts dataset. The results outperform our baseline (U-Net ResNet-34) by 3.76%, exceed Web-Net by 2.24% and surpass HFSA-Unet by 2.17%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Adiba A, Hajji H, Maatouk M (2019) Transfer learning and u-net for buildings segmentation. In: Proceedings of the new challenges in data sciences: acts of the second conference of the Moroccan classification society, ACM, p 14

  2. Aptoula E (2013) Remote sensing image retrieval with global morphological texture descriptors. IEEE Trans Geosci Remote Sens 52(5):3023–3034

    Article  Google Scholar 

  3. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  Google Scholar 

  4. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. https://arxiv.org/abs/1409.0473

  5. Bischke B, Helber P, Folz J, Borth D, Dengel A (2019) Multi-task learning for segmentation of building footprints with deep neural networks. In: 2019 IEEE international conference on image processing (ICIP), IEEE, pp 1480–1484

  6. Fourure D, Emonet R, Fromont E, Muselet D, Tremeau A, Wolf C (2017) Residual conv-deconv grid network for semantic segmentation. https://arxiv.org/abs/1707.07958

  7. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323

  8. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  9. He N, Fang L, Plaza A (2020) Hybrid first and second order attention unet for building segmentation in remote sensing images. Inf Sci 63(140305):1–140305

    Article  Google Scholar 

  10. Hu S, Ning Q, Chen B, Lei Y, Zhou X, Yan H, Zhao C, Tang T, Hu R (2020) Segmentation of aerial image with multi-scale feature and attention model. In: Artificial Intelligence in China, Springer, pp 58–66

  11. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  12. Ji S, Wei S, Lu M (2018) Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Tran Geosci Remote Sens 57(1):574–586

    Article  Google Scholar 

  13. Khalel A, El-Saban M (2018) Automatic pixelwise object labeling for aerial imagery using stacked u-nets. https://arxiv.org/abs/1803.04953

  14. Kim JH, Lee H, Hong SJ, Kim S, Park J, Hwang JY, Choi JP (2018) Objects segmentation from high-resolution aerial images using u-net with pyramid pooling layers. IEEE Geosci Remote Sens Lett 16(1):115–119

    Article  Google Scholar 

  15. Liu Y, Gross L, Li Z, Li X, Fan X, Qi W (2019) Automatic building extraction on high-resolution remote sensing imagery using deep convolutional encoder-decoder with spatial pyramid pooling. IEEE Access 7:128774–128786

    Article  Google Scholar 

  16. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  17. Luong MT, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. https://arxiv.org/abs/1508.04025

  18. Maggiori E, Tarabalka Y, Charpiat G, Alliez P (2017) Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark. In: 2017 IEEE international geoscience and remote sensing symposium (IGARSS), IEEE, pp 3226–3229

  19. Mitra P, Shankar BU, Pal SK (2004) Segmentation of multispectral remote sensing images using active support vector machines. Pattern Recogn Lett 25(9):1067–1074

    Article  Google Scholar 

  20. Mnih V (2013) Machine learning for aerial image labeling. Citeseer

  21. Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla NY, Kainz B, et al. (2018) Attention u-net: Learning where to look for the pancreas. https://arxiv.org/abs/1804.03999

  22. Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 26(1):217–222

    Article  Google Scholar 

  23. Pan X, Yang F, Gao L, Chen Z, Zhang B, Fan H, Ren J (2019) Building extraction from high-resolution aerial imagery using a generative adversarial network with spatial and channel attention mechanisms. Remote Sens 11(8):917

    Article  Google Scholar 

  24. Qi HN, Yang JG, Zhong YW, Deng C (2004) Multi-class svm based remote sensing image classification and its semi-supervised improvement scheme. In: Proceedings of 2004 international conference on machine learning and cybernetics (IEEE Cat. No. 04EX826), IEEE, vol 5, pp 3146–3151

  25. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Springer, pp 234–241

  26. Sebastian C, Imbriaco R, Bondarev E, de With PH (2020) Adversarial loss for semantic segmentation of aerial imagery. https://arxiv.org/abs/2001.04269

  27. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556

  28. Singh PP, Garg R (2013) Automatic road extraction from high resolution satellite image using adaptive global thresholding and morphological operations. J Ind Soc Remote Sens 41(3):631–640

    Article  Google Scholar 

  29. Song L, Xu Y, Zhang L, Du B, Zhang Q, Wang X (2020) Learning from synthetic images via active pseudo-labeling. IEEE Transactions on Image Processing

  30. Tuermer S, Kurz F, Reinartz P, Stilla U (2013) Airborne vehicle detection in dense urban areas using hog features and disparity maps. IEEE J Select Top Appl Earth Observ Remote Sens 6(6):2327–2337

    Article  Google Scholar 

  31. Xia J, Du P, He X, Chanussot J (2013) Hyperspectral remote sensing image classification based on rotation forest. IEEE Geosci Remote Sens Lett 11(1):239–243

    Article  Google Scholar 

  32. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057

  33. Yi Y, Zhang Z, Zhang W, Zhang C, Li W, Zhao T (2019) Semantic segmentation of urban buildings from VHR remote sensing imagery using a deep convolutional neural network. Remote Sens 11(15):1774

    Article  Google Scholar 

  34. Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Learning a discriminative feature network for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1857–1866

  35. Zhang L, Zhang L, Tao D, Huang X (2011) On combining multiple features for hyperspectral remote sensing image classification. IEEE Trans Geosci Remote Sens 50(3):879–893

    Article  Google Scholar 

  36. Zhang Y, Gong W, Sun J (1897) Li W (2019) Web-net: A novel nest networks with ultra-hierarchical sampling for building extraction from aerial imageries. Remote Sens 11(16)

  37. Zhang Z, Liu Q, Wang Y (2018) Road extraction by deep residual u-net. IEEE Geosci Remote Sens Lett 15(5):749–753

    Article  Google Scholar 

  38. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890

  39. Zhao H, Zhang Y, Liu S, Shi J, Change Loy C, Lin D, Jia J (2018) Psanet: Point-wise spatial attention network for scene parsing. In: Proceedings of the European conference on computer vision (ECCV), pp 267–283

  40. Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2018) Unet++: A nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support, Springer, pp 3–11

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China(No.2018YFB1305200) and Science Technology Department of Zhejiang Province(No.LGG19F020010). An earlier version of this paper was accepted at the Conference on International Conference on Pattern Recognition

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sheng Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, S., Ye, H., Jin, K. et al. CT-UNet: Context-Transfer-UNet for Building Segmentation in Remote Sensing Images. Neural Process Lett 53, 4257–4277 (2021). https://doi.org/10.1007/s11063-021-10592-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-021-10592-w

Keywords

Navigation