Applying Bayesian spatio-temporal models to demand analysis of shared bicycle

https://doi.org/10.1016/j.physa.2021.126296Get rights and content

Highlights

  • Analyze the regional travel demand of shared bicycles from the perspective of time and space.

  • By combining the Integrated Nested Laplace Approximation and Stochastic Partial Differential Equation, it guarantees the establishment of the feasibility of algorithms on large-scale data set.

  • Propose to use the Kolmogorov–Smirnov method to analyze the statistics of shared bicycle travel demand and get the distribution type.

Abstract

Shared bicycle provides a cheap and healthy mobility alternative to travelers especially for the “first–last mile” trips. Although the temporal and spatial correlation of regional shared bicycle needs has been confirmed in the literature in recent years, the interdependencies between them are not yet fully understood. In this paper, a spatio-temporal Bayesian modeling method is proposed to quantify regional shared bicycle demand and identify the impact of various factors on the cycling trips. By combining the Integrated Nested Laplace Approximation (INLA) and Stochastic Partial Differential Equation (SPDE), it guarantees the establishment of the feasibility of algorithms on large-scale spatiotemporal data structures. In particular, the massive rental records of Mobike in Shanghai in August 2016 are used as the study observation. We establish a series of Bayesian models with different temporal and spatial structures, and uses the Deviation Information Criteria to verify the relevance of the models in the temporal and spatial dimensions. Moreover, the Kolmogorov–Smirnov test is proposed to fit different distributions to obtain the optimal distribution family of travel demand data. Our research efforts have further been made to evaluate the impact of meteorological factors, population density and per capita GDP on travel demand. The result shows that the model of temporal and spatial correlation structure can better assess the regional distribution of future shared bicycle riding demands, and the influence of temperature and precipitation on cycling demand is more significant. The study’s findings will help guide the development of future shared bicycle regional scheduling work, and improve economic benefits on the basis of meeting traveler’ needs.

Introduction

In terms of urban short-distance travel, shared bicycle travel has great advantages in alleviating the current traffic congestion problem, connecting the public transportation system and improving the operation efficiency of urban transportation. To a certain extent, shared bicycle has solved the problem of the first–last mile of travel. Since the shared bicycle was proposed, it has strongly promoted the rapid development of public transportation and slow-moving traffic. However, there are also some challenging issues to be addressed in this process, such as unbalanced temporal and spatial distribution of borrowing and repayment, and unreasonable regional scheduling [1].

In recent years, researchers have paid much attention to the regional scheduling problem of shared bicycle. Deng Chao et al. combined the characteristics of historical borrowing and returning data of shared bicycle and used the Analytic Hierarchy Process to analyze the factors affecting the utilization rate of shared bicycle stations. The results showed that the rent and return of shared bicycle was significantly larger when it is near commercial and residential areas. It confirmed the spatial relevance of shared bicycle travel [2]. Research by Divya et al. has shown that there is a correlation between the usage of shared bicycle and taxi, weather and space [3]. Caggiani L et al. used spatio-temporal clustering to study the travel demand of shared bicycle, and proposed an algorithm for solving dynamic area clustering to determine the optimal number of shared bicycle stations [4]. Benchimol M et al. used the known demand for shared bicycle stations to solve the regional scheduling problem by establishing a static scheduling model with the smallest scheduling cost [5]. Reiss analyzed the GPS data of shared bicycle and investigated the temporal and spatial patterns of shared bicycle [6]. Yongping Zhang et al. proposed the concept of bicycle islands, which combines high-resolution trajectory data with spatial and geographical regions. The result shows that the formation of bicycle islands is closely related to the surrounding land [7]. Based on previous studies, Sergio Guidon et al. used spatial regression model and random forest model to predict shared bicycle demand. The results show that the performance of spatial regression model considering spatial lag and spatial error is better than that of random forest, which also proves that there is spatial correlation between shared bicycle demand [8]. Wen zhen Jia et al. proposed a two-level Gaussian mixture model clustering algorithm, which grouped bicycle stations and considered the migration trend and geographical location information of shared bicycles among stations. Finally, gradient lifting regression tree was used to predict traffic. To a certain extent, it solves the problem of unbalanced rent of shared bicycle at different times and different sites [9]. Tao et al. proposed a practical data-driven method to improve the regional bike sharing allocation problem, and established an integer value DEA model. Taking the area, the number of allocated bicycles and other indicators as the input, the number of shared bicycles used as the output, the parking lot of shared bicycles as the entity, and the entity efficiency is calculated by the ratio of output to input [10]. Wang et al. proposed an improved recurrent neural network for shared bicycle demand. The results show that the RNN model with complex structure can achieve long-term prediction under the premise of ensuring accuracy [11]. At present, most of the research on the allocation and demand of shared bicycle has focused on historical travel data based on regions or stations, using machine learning related algorithms to model and analyze the demand for short-term shared bicycle from the time and space dimensions [12], [13]. However, for the time and space relevance of shared bicycle riding requirements, a more complete explanation and quantification of the interdependence between them has not yet been given.

To address above issues, from the perspective of time and space, this study establishes a shared bicycle riding demand model and uses the Integrated Nested Laplace Approximation (INLA) method to estimate the model [14]. Then we further analyze the impact of shared bicycle riding demand from the interaction of time and space structure factor. The INLA method has natural advantages for the analysis of time–space models. And the Bayesian model can be quickly calculated combining with the Stochastic Partial Differential Equation (SPDE) method. It is worth noting that Markov Chain Monte Carlo (MCMC) method is also a common method for calculating Bayesian space–time models, but its limitation lies in the time-consuming calculation. The MCMC method requires multiple iterations of sampling during calculation. The analysis using MCMC method often takes several days when the data set to be analyzed is large. The advantages of the INLA method are revealed, it can avoid multiple iterative sampling and quickly analyze the model under the premise of ensuring accuracy [15]. At present, the INLA algorithm has been embedded in a variety of models and has been applied in various fields [16], [17], [18], [19], [20], but its application in the transportation field is still rare.

The rest of the paper is organized as follows: In the second section, we explain the spatiotemporal model and INLA–SPDE method; in the third section, we introduce the input data set and how to choose a series of Bayesian models with different spatio-temporal structures, then we performed a quantitative explanation and analysis of the parameters of the time–space model; finally, we summarized the full text and point out possible research directions.

Section snippets

Spatio-temporal demand model

Our analysis highlighted that the riding demand of shared bicycle is affected by time and space factors. Consequently, we developed different models. The general model structure is shown as follows: y(si,t)=B(si,t)β+ξ(si,t)+ɛ(si,t)where y(si,t) represents the demand for shared bicycle at position si at time t, and the values of t is the time scale of interest. B(si,t)=(B1(si,t),B2(si,t),,Bm(si,t)) shows the selected combination of m covariates, and the β=(β1,β2,,βm) is the covariate

Data collection and processing

The research area, Shanghai, is located between 120°52’ to 122°12’ east longitude and 30°40’ to 31°53’ north latitude in China. It has a typical subtropical monsoon climate with distinct four seasons. In order to study some possible influencing factors on the demand for shared bicycle, we analyze the data of about 100,000 orders randomly sampled by Shanghai Mobike in August 2016. The data set includes order ID, user ID, longitude and latitude of the starting point and the ending point of the

Conclusions and future work

This paper applies the INLA method to analyze a spatial–temporal model for shared bicycle demand and measure the impact of meteorological factors, population density and per capita GDP on the shared bicycle riding demand. Major findings from the spatial–temporal analysis can be summarized as follows: (1) In the divided county-level administrative regions, we find that most of the origin points and destination points are in the same area. Then it is proposed to use origin points for modeling

CRediT authorship contribution statement

Yimeng Duan: Conceptualization, Methodology, Data curation, Writing – original draft. Shen Zhang: Visualization, Investigation, Software, Supervision. Zhuoran Yu: Software, Validation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research was supported by the National Key Research and Development Program of China (2018YFB1600600).

References (30)

Cited by (5)

  • Demand forecasting of shared bicycles based on combined deep learning models

    2024, Physica A: Statistical Mechanics and its Applications
  • Large-scale dockless bike sharing repositioning considering future usage and workload balance

    2022, Physica A: Statistical Mechanics and its Applications
    Citation Excerpt :

    Clustering is an unsupervised and widely used machine learning classification method, aiming to allocate different elements into some homogeneous groups. Because most of the origins and destinations are in the same place [45], the origins of trip data are used as the clustering input. In our former research [38], there are three clustering methods applied in determining virtual stations, including K-means, DBSCAN, and spatio-temporal clustering.

View full text