Elsevier

Remote Sensing of Environment

Volume 247, 15 September 2020, 111838
Remote Sensing of Environment

Open-source data-driven urban land-use mapping integrating point-line-polygon semantic objects: A case study of Chinese cities

https://doi.org/10.1016/j.rse.2020.111838Get rights and content

Highlights

Open-source data-driven urban land-use mapping integrating point-line-polygon semantic objects: a case study of Chinese cities

  • Bridging the application gap between remote sensing and land-use is important.

  • A point-line-polygon semantic object mapping (PLPSOM) framework is proposed.

  • PLPSOM uses the open-source data to reduce the time-consuming sample production.

  • PLPSOM represents the urban land parcel by point-line-polygon semantic object.

  • PLPSOM has been successfully applied in six areas of the four Chinese cities.

Abstract

Reliable urban land-use maps are essential for urban analysis because the spatial distribution of land use reflects the complex environment of cities under the combined effects of nature and socio-economics. In recent years, very high resolution (VHR) remote sensing imagery interpretation has resolved the “semantic gap” between the low-level data and the high-level semantic scenes, and has been used to map urban land use. Nevertheless, the existing frameworks cannot easily be applied to practical urban analysis, which can be attributed to three main reasons: 1) the indistinguishable socio-economic attributes of the same ground object layouts; 2) the weak transferability of the supervised frameworks and the time-consuming training sample annotation; and 3) the category system inconsistency between the data source and the urban land-use application. In this paper, to achieve an “application gap” breakthrough for urban land-use mapping, a data-driven point, line, and polygon semantic object mapping (PLPSOM) framework is proposed, which makes full use of open-source VHR images and multi-source geospatial data. In the PLPSOM framework, point, line, and polygon semantic objects are represented by the points of interest (POIs), OpenStreetMap (OSM) data, and VHR images corresponding to the scenes in the land-use mapping units, respectively. OSM line semantic objects are utilized to supply the boundaries of the land-use mapping units for the POIs and VHR images, forming urban land parcels (street blocks). To reduce the cost of the data annotation, the training dataset is constructed using multiple open-source data sources. An enhanced deep adaptation network (EDAN) is then proposed to acquire the categories of the VHR scene images in the case of partial transfer learning. Finally, in order to meet the actual needs, a rule-based category mapping (RCM) model is applied to integrate the categories of the POIs and VHR images into the urban land-use category system, allowing us to acquire the land-use maps of the cities. The effectiveness of the proposed method was tested in four cities of China, including six specific areas: Beijing and Wuhan city centers; the Hanyang District of Wuhan; the Hannan District of Wuhan; Macao; and the Wan Chai area of Hong Kong, achieving a high classification accuracy. The “urban image” analysis confirmed the practicality of the obtained urban land-use maps.

Introduction

Urban environmental problems caused by the rapid growth of urbanization have received worldwide attention during the past few decades (Kamusoko, 2017; Kawamura et al., 1998). Urban land-use maps are commonly used as a tool in urban analysis for describing socio-economic functions and environmental conditions (Dewan and Yamaguchi, 2009; Herold et al., 2003). With the development of very high resolution (VHR) satellite technologies, it is now possible for VHR images to clearly represent the geometry, texture, size, position, and other relevant information of the ground objects at a finer scale (Wang et al., 2019; Zhang et al., 2018). However, the variability of the object categories and the diversity of the object distributions in the land-use mapping units leads to the “semantic gap” between the low-level data and the high-level semantic information (Zhao et al., 2013), which means that land-use mapping is still a challenge (Grippa et al., 2018; Huang et al., 2018).

With regard to remote sensing image understanding, “scenes” are commonly used to identify accurate urban land-use patterns (Liu et al., 2017). In recent years, many studies applying VHR images have attempted to map urban land use by the use of a scene classification model. The basic idea is to apply the trained model to the target data, thereby giving the target data consistent categories with the training data (e.g., bridge, forest, airport, etc.) (Castelluccio et al., 2015; Huang et al., 2018; Nogueira et al., 2017). According to the different feature levels, these models can be summarized into four types (Huang et al., 2018): 1) models based on low-level features; 2) models based on mid-level features; 3) models based on object-level features; and 4) models based on high-level features. Low-level features are based on the assumption that the land use can be directly represented by its intrinsic features excavated or designed by experts. Examples of low-level features are shape features (Oliva and Torralba, 2001), texture features, and color features (Santos et al., 2010; Serrano et al., 2004). However, low-level features are insufficient to describe the variety and complexity of the true characteristics of the ground surface. The focus of mid-level features is creating and coding a dictionary of the local low-level features, in order to describe the details more robustly (Zhao et al., 2016; Zhu et al., 2018). Object-level features are extracted from the results of object-oriented classification, focusing on the relationships of objects instead of low-level features (Cheng et al., 2017; Li et al., 2016; Zhong et al., 2017). Compared to the previous three types of manual features, high-level features are optimal features autonomously learned through deep neural networks, such as convolutional neural networks (CNNs) (Yao et al., 2017b; Zhang et al., 2017) and fine-tuned CNNs (Hu et al., 2015; Tu et al., 2018), which can achieve relatively high land-use classification accuracies.

The previous land-use classification methods based on VHR images bridged the semantic gap between the low-level features of the objects and the high-level semantics of the scenes. However, through the production survey, we have found that the obtained results cannot easily be applied in practical urban land-use mapping, on account of the “application gap” (Fig. 1). The main reasons for the land-use application gap are as follows.

  • 1)

    It is difficult for VHR images to express land use composed of ground objects with similar spatial layouts but completely different socio-economic information. For example, consider the case of there being several office buildings in two land parcels. Each floor of the office buildings in one land parcel are enterprises, but in the other land parcel are government organizations. These categories are indistinguishable by VHR images.

  • 2)

    It is difficult to achieve cross-regional universality with the traditional scene classification model. In practical applications, the training dataset needs to be recreated according to the different research cities, which consumes both time and manpower.

  • 3)

    The category systems for the scenes and urban land use are inconsistent, with different numbers of category quantities and different meanings of the categories.

To extract socio-economic semantics for urban land-use mapping, multi-source geospatial data, such as social media data and volunteered geographic information (VGI) data, are being increasingly utilized in urban analysis (Soliman et al., 2017). Compared with social media data, VGI data are free and are updated in real time. Furthermore, the data format is diverse, allowing the data to reflect the status of a city from different aspects. For example, OpenStreetMap (OSM) polygon data are commonly used as ground-truth data (Audebert et al., 2017; Fonte et al., 2017), while road line data are often used as land parcel boundaries (Chen et al., 2016). Point of interest (POI) data are widely deployed for their geographic information, including location, place, category, density, and distribution (Rodrigues et al., 2012; Yao et al., 2017a). Land-use classification using POIs generally consists of three main steps: 1) hard reclassification of the POIs; 2) quadrat analysis or kernel density estimation to quantify the spatial distribution patterns of the POIs; and 3) unsupervised classification or supervised classification (Hu et al., 2016; Liu et al., 2017; Long and Liu, 2015). However, hard classifying point-scale POI categories into land-use types loses much of the information contained in the points. For example, incorporating the food and beverage category (a category in the Amap POI standards) into commercial and business facilities clearly violates the fact that the food and beverage category is also likely to occur in residential areas, parks, and industrial areas. Furthermore, POIs lack particular natural attributes, such as river, farmland, or meadow.

In fact, urban land-use identification is strongly associated with both the external physical characteristics and the inner socio-economic semantics. For these reasons, the current land-use classification methods combining VHR images with VGI data are used to recreate datasets and generate land-use maps for specific cities. OSM data are commonly used as boundaries, and a classifier is then used to classify the combined features extracted from the VHR images and POIs (Hu et al., 2016; Liu et al., 2017). However, this traditional land-use classification framework still faces three problems in practical urban analysis. Firstly, most of the existing methods employ supervised classification frameworks, which are not extensible for different cities. In addition, due to the poor transferability of the frameworks, it is necessary to recreate the training samples for different cities manually, resulting in a huge time cost. Secondly, in the traditional framework, the VHR images and POIs are fused at the feature level and, as such, it is difficult to analyze the constituent factors of complex land-use types. In actual circumstances, scenes may only be components of the land use, and the surrounding environment should also be taken into account. For example, basketball courts are commonly found in sports areas, but may also be found in education and science areas, such as university campuses. Thirdly, the categories of the input data used in the traditional methods are consistent with the categories of the output land use, which is not robust for different urban analyses.

Describing land parcels from the perspective of semantic objects can effectively explain the causes of the above problems. In geographic information systems, objects are modeled as points (or pixels), lines, and polygons (or areas) (Goodchild, 1993; Tang et al., 1996). In urban land-use classification research, the land-use mapping units can also be described by points, lines, and polygons (Fig. 2). Land-use mapping units segmented by regular grids cause the mosaic phenomenon, which is manifested as sawtooth boundaries and ambiguous geographical meanings. Therefore, it is necessary to use line objects, e.g., road networks, to segment the geographically meaningful land parcels. However, the semantics represented by the same objects at different scales can be different. The scales of polygon objects (i.e., VHR images) and point objects (i.e., POIs) are different from the land parcel scale of the land-use mapping units, so that they cannot express all the natural and socio-economic attributes. Fortunately, POIs can represent local details and contain socio-economic information, and VHR image polygons can express global information at a larger scale. For example, a land parcel may be “market” at the POI scale and “medium residential” or “playground” at the VHR image scale, but in fact is a “school” containing residential buildings at a larger polygon scale. Finally, the education and science land-use type is attached to the land parcel through the integration and category mapping of the point-line-polygon semantic objects. In addition, since the attributes of the POI and OSM data can reflect the semantic information, and the VHR images have semantic category information after being classified, the objects are called “semantic objects”, in order to distinguish them from ordinary objects.

In this paper, to break through the application gap and meet the requirements of practical urban analysis, point-line-polygon semantic objects are used to map the land use in an open-source data-driven manner, and the point-line-polygon semantic object mapping (PLPSOM) framework is proposed. The main contributions of this paper can be summarized as follows.

  • 1)

    In order to reduce the time-consuming sample production, the proposed PLPSOM framework utilizes completely open-source data, including VHR images and VGI data. The VHR images provide clues from a visual perspective, and with regard to the VGI data, OSM road networks and water channels are used to generate the land parcels, and POIs are used to provide the socio-economic information.

  • 2)

    In the PLPSOM framework, to improve the transferability and universality of the framework, an enhanced deep adaptation network (EDAN) scene classification model is proposed to obtain the categories of the polygon semantic objects through the use of the existing public datasets. The EDAN model introduces multi-scale grid sampling to reduce the distribution differences between the public datasets and the target images, and utilizes a weighted network to solve the problem of the categories of the public datasets being more than those of the target domains.

  • 3)

    In order to fuse the point and polygon semantic objects of the POIs and VHR images, and solve the problem of inconsistent categories between the semantic object and land-use category systems, a rule-based category mapping (RCM) model is proposed, integrating the weighted bag-of-words (BoW) model and the Word2Vec model. The RCM model considers the mutual constraints between the semantic objects and the land use, to match the undefined land parcels with the correct land-use types. Finally, the PLPSOM framework was successfully applied to map urban land-use in six areas of the four Chinese cities of Wuhan, Beijing, Hong Kong, and Macao.

Section snippets

Study areas

Six areas of cities with different levels of urbanization, city size, development degree, and urban structure were selected as the study areas: Beijing city center; Wuhan city center; the Hanyang District of Wuhan; the Hannan District of Wuhan; Macao; and the Wan Chai area of Hong Kong (Fig. 3). Beijing is not only the capital, but also the political, cultural, and educational center of China. According to the Beijing Municipal Bureau of Statistics the population of Beijing reached 21.71

Methods

In order to suit the needs of land-use mapping in various cities, the PLPSOM framework is proposed. In the PLPSOM framework, the land parcels are generated as the cartography units based on the constraint of OSM line data. Naturally, the POI data and the VHR images are matched to the land parcels. Subsequently, the point semantic objects are acquired from the land parcels of POIs, inheriting the categories of the POIs. With regard to the polygon semantic objects, they are sampled from the land

Results

The presented results include five parts: 1) the land parcels obtained by segmentation; 2) the quantitative effect of the VHR image classification results; 3) the mixture of semantic objects in the land parcels; 4) visual interpretation of the land-use maps; and 5) the overall classification accuracy (OA) of the land-use mapping.

Scalability of the land-use category system

Although the established land-use category system can meet the general needs of urban analysis, there are still some imperfections in the category system, due either to limitations in the expert knowledge or the additional requirements of the urban application. Therefore, it was necessary to explore the robustness of the PLPSOM framework with regard to the land-use category system.

Firstly, by reducing the land-use categories to nine (the level I categories in the MOHURD urban land-use

Conclusion

In this paper, the open-source data-driven PLPSOM framework has been proposed to efficiently produce land-use maps that meet the needs of urban analysis, using point-line-polygon semantic objects of VHR images and multi-source geospatial data. The OSM road network data, as line data, provide a boundary constraint for the land parcels. The POIs and VHR images, as point and polygon data, respectively, make up the semantic objects at local and global scales. The EDAN scene classification model

Declaration of Competing Interest

Declaration of Competing InterestThe authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

The authors would like to thank the editor, associate editor, and anonymous reviewers for their helpful comments and advice. This work was supported by National Key Research and Development Program of China under Grant No. 2017YFB0504202, National Natural Science Foundation of China under Grant Nos. 41771385, 41622107, 41820104006 and 61871299, Open Research Fund of State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University under Grant No. 18E03.

References (44)

  • L. Fei-Fei et al.

    A Bayesian hierarchical model for learning natural scene categories

  • C. Fonte et al.

    Generating up-to-date and detailed land use and land cover maps using openstreetmap and GlobeLand30

    ISPRS Int. J. Geo-Inform.

    (2017)
  • M.F. Goodchild

    The state of GIS for environmental problem-solving

  • T. Grippa et al.

    Mapping urban land use at street block level using openstreetmap, remote sensing data, and spatial metrics

    ISPRS Int. J. Geo Inf.

    (2018)
  • K. He et al.

    Deep residual learning for image recognition

  • M. Herold et al.

    Spatial metrics and image texture for mapping urban land use

    Photogramm. Eng. Remote Sens.

    (2003)
  • F. Hu et al.

    Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery

    Remote Sens.

    (2015)
  • T. Hu et al.

    Mapping urban land use by using Landsat images and open social data

    Remote Sens.

    (2016)
  • J. Jokar Arsanjani et al.

    Toward mapping land-use patterns from volunteered geographic information

    Int. J. Geogr. Inf. Sci.

    (2013)
  • C. Kamusoko

    Importance of remote sensing and land change modeling for urbanization studies

  • M. Kawamura et al.

    Comparison of urbanization of four Asian cities using satellite data

    Doboku Gakkai Ronbunshu

    (1998)
  • X. Liu et al.

    Classifying urban land use by integrating remote sensing and social media data

    Int. J. Geogr. Inf. Sci.

    (2017)
  • Cited by (67)

    • An MIU-based deep embedded clustering model for urban functional zoning from remote sensing images and VGI data

      2024, International Journal of Applied Earth Observation and Geoinformation
    • Semi-supervised knowledge distillation framework for global-scale urban man-made object remote sensing mapping

      2023, International Journal of Applied Earth Observation and Geoinformation
    View all citing articles on Scopus
    View full text