Skip to main content
Log in

Strategies to access web-enabled urban spatial data for socioeconomic research using R functions

  • Original Article
  • Published:
Journal of Geographical Systems Aims and scope Submit manuscript

Abstract

Since the introduction of the World Wide Web in the 1990s, available information for research purposes has increased exponentially, leading to a significant proliferation of research based on web-enabled data. Nowadays the use of internet-enabled databases, obtained by either primary data online surveys or secondary official and non-official registers, is common. However, information disposal varies depending on data category and country and specifically, the collection of microdata at low geographical level for urban analysis can be a challenge. The most common difficulties when working with secondary web-enabled data can be grouped into two categories: accessibility and availability problems. Accessibility problems are present when the data publication in the servers blocks or delays the download process, which becomes a tedious reiterative task that can produce errors in the construction of big databases. Availability problems usually arise when official agencies restrict access to the information for statistical confidentiality reasons. In order to overcome some of these problems, this paper presents different strategies based on URL parsing, PDF text extraction, and web scraping. A set of functions, which are available under a GPL-2 license, were built in an R package to specifically extract and organize databases at the municipality level (NUTS 5) in Spain for population, unemployment, vehicle fleet, and firm characteristics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. We consider ‘web-enabled’ as different than ‘web-based’, which is related to methods used in psychology and behavioral studies (Skitka and Sargis 2006; Denissen et al. 2010).

  2. According Eurostat, The LAUs (Local Administrative Units) are subdivisions of the NUTS 3 regions, which consist of municipalities or equivalent units (formerly NUTS 5). The NUTS classification (Nomenclature of territorial units for statistics) is a hierarchical system for dividing the economic territory of the EU. NUTS 1 are major socio-economic regions (e.g. Spain), NUTS 2 are basic regions for the application of regional policies (e.g. autonomous community of Extremadura) and NUTS 3 are small regions for specific diagnoses (e.g. province of Badajoz).

  3. This R package is freely available from the site https://github.com/amvallone/DataSpa. It must be installed in the R console with the command: devtools::install_github("amvallone/DataSpa").

  4. http://www.ine.es.

  5. http://www.sepe.es.

  6. All the R functions are in the aforementioned repository: https://github.com/amvallone/DataSpa.

  7. These functions deal with two important difficulties derived from the construction of panels for municipality data in Spain. First, they control for municipality entries and removals, which take place almost every year, adapting the final data frame to the configuration corresponding to the last period. Second, they produce a list of name equivalences, based on the information provided by the INE, to manage with constant changes in the municipality names, always assigning the one corresponding the last period.

  8. www.dgt.es.

  9. http://www.dgt.es/es/seguridad-vial/estadisticas-e-indicadores/informacion-municipal.

  10. https://www.pdf2txt.com.

  11. http://www.ine.es/dynt3/inebase/es/index.htm?padre=51&dh=1.

  12. http://www.minetad.gob.es/industria/RII/Paginas/Index.aspx.

  13. http://www.camerdata.es/index.php.

  14. https://www.bvdinfo.com/en-gb/our-products/data/national/sabi.

  15. http://www.gem-spain.com.

  16. https://www.axesor.es.

References

Download references

Acknowledgements

This work was supported by Spanish Ministry of Economics and Competitiveness (ECO2015-65758-P) and the Regional Government of Extremadura (Spain). The usual disclaimers apply.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Coro Chasco.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vallone, A., Chasco, C. & Sánchez, B. Strategies to access web-enabled urban spatial data for socioeconomic research using R functions. J Geogr Syst 22, 217–239 (2020). https://doi.org/10.1007/s10109-019-00309-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10109-019-00309-y

Keywords

JEL Classification

Navigation