Abstract
Ridership prediction at station level plays a critical role in subway transportation planning. Among various existing ridership prediction methods, direct demand model has been recognized as an effective approach. However, direct demand models including geographically weighted regression (GWR) have rarely been studied for local model selection in ridership prediction. In practice, acquiring insights into subway ridership under multiple influencing factors from a local perspective is important for passenger flow management and transportation planning operations adapting to local conditions. In this study, we propose an adapted geographically weighted LASSO (Ada-GWL) framework for modelling subway ridership, which involves regression-coefficient shrinkage and local model selection. It takes subway network layout into account and adopts network-based distance metric instead of Euclidean-based distance metric, making it so-called adapted to the context of subway networks. The real-world case of Shenzhen Metro is used to elaborate our proposed model. The results show that the proposed Ada-GWL model performs the best compared with the global model (ordinary least square, GWR, GWR calibrated with network-based distance metric and geographically weighted LASSO (GWL) in terms of estimation error and goodness-of-fit. Through understanding the variation of each coefficient across space (elasticities) and variables selection of each station, it provides more realistic conclusions based on local analysis. Besides, through clustering analysis of the stations according to the regression coefficients, clusters’ functional characteristics are found to be in compliance with the policy of functional land use in Shenzhen, indicating the high interpretability of Ada-GWL model from the spatial angle. In other words, the regression coefficients of different stations can provide us the local prospective to understand the influence of factors on stations’ ridership.
Similar content being viewed by others
Notes
Source: https://map.baidu.com/.
Source: https://maps.google.com.
Average daily ridership of a whole week is the average of total ridership of seven days of week in operation times (6:30–23:00). Rush hour ridership is calculated by the total ridership of evening rush-hours from 17:00 to 19:00 of a whole week divided by 14 h (multiply 2 h by 7 days). Non-rush hour ridership is calculated by the total ridership of remaining time (9:00–17:00, 19:00–23:00) except for morning (7:00–9:00) and evening rush-hours of a whole week divided by 84 h (multiply 12 h by 7 days).
References
Brunsdon, C., Fotheringham, A.S., Charlton, M.E.: Geographically weighted regression: a method for exploring spatial nonstationarity. Geogr. Anal. 28(4), 281–298 (1996)
Burnham, K.P., Anderson, D.R.: Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer, Berlin (2002)
Cardozo, O.D., García-Palomares, J.C., Gutiérrez, J.: Application of geographically weighted regression to the direct forecasting of transit ridership at station-level. Appl. Geogr. 34, 548–558 (2012)
Cervero, R.: Alternative approaches to modeling the travel-demand impacts of smart growth. J. Am. Plan. Assoc. 72(3), 285–295 (2006)
Chan, S., Miranda-Moreno, L.: A station-level ridership model for the metro network in Montreal, Quebec. Can. J. Civ. Eng. 40(3), 254–262 (2013)
Choi, J., Lee, Y.J., Kim, T., Sohn, K.: An analysis of metro ridership at the station-to-station level in Seoul. Transportation 39(3), 705–722 (2012)
Chow, L.F., Zhao, F., Liu, X., Li, M.T., Ubaka, I.: Transit ridership model based on geographically weighted regression. Transp. Res. Rec. 1, 105–114 (2006)
Chu, X.: Ridership models at the stop level. National Center for Transit Research, University of South Florida, Tech. rep. (2004)
Cleveland, W.S.: Robust locally weighted regression and smoothing scatterplots. Publ. Am. Stat. Assoc. 74(368), 829–836 (1979)
Cressie, N.A.C.: Statistics for Spatial Data. Wiley, New York (1993)
Csardi, G., Nepusz, T.: The igraph software package for complex network research. Int. J. Complex Syst. 1695, 1–9 (2006)
Deng, J., Xu, M.: Characteristics of subway station ridership with surrounding land use: a case study in Beijing. In: 2015 International Conference on Transportation Information and Safety (ICTIS). IEEE, pp. 330–336 (2015)
Douglas Nychka, J.P., Furrer, R., Sain, S.: Fields: Tools for Spatial Data. https://CRAN.R-project.org/package=fields, r package version 9.6 (2018)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Environmental Systems Research Institute, Inc (ESRI): Arcgis (2014). http://desktop.arcgis.com/en/arcmap/
Erciyes, K.: Complex Networks: An Algorithmic Perspective. CRC Press, Inc, Boca Raton (2014)
Estupiñán, N., Rodríguez, D.A.: The relationship between urban form and station boardings for Bogota’s BRT. Transp. Res. Part A Policy Pract. 42(2), 296–306 (2008)
Gauraha, N.: Introduction to the LASSO. Resonance 23(4), 439–464 (2018)
Guerra, E., Cervero, R., Tischler, D.: Half-mile circle: does it best represent transit station catchments? Transp. Res. Rec. J. Transp. Res. Board 2276, 101–109 (2012)
Guo, L., Ma, Z., Zhang, L.: Comparison of bandwidth selection in application of geographically weighted regression: a case study. Can. J. For. Res. 38(9), 2526–2534 (2008)
Gutiérrez, J., García-Palomares, J.C.: Distance-measure impacts on the calculation of transport service areas using GIS. Environ. Plan. B Plan. Des. 35(3), 480–503 (2008)
Gutiérrez, J., Cardozo, O.D., García-Palomares, J.C.: Transit ridership forecasting at station level: an approach based on distance-decay weighted regression. J. Transp. Geogr. 19(6), 1081–1092 (2011)
Hu, Y., Miller, H.J., Li, X.: Detecting and analyzing mobility hotspots using surface networks. Trans. GIS 18(6), 911–935 (2014)
Hu, N., Legara, E.F., Lee, K.K., Hung, G.G., Monterola, C.: Impacts of land use and amenities on public transport use, urban planning and design. Land Use Policy 57, 356–367 (2016)
Hu, Y., Wang, F., Guin, C., Zhu, H.: A spatio-temporal kernel density estimation framework for predictive crime hotspot mapping and evaluation. Appl. Geogr. 99, 89–97 (2018)
Jun, M.J., Choi, K., Jeong, J.E., Kwon, K.H., Kim, H.J.: Land use characteristics of subway catchment areas and their influence on subway ridership in Seoul. J. Transp. Geogr. 48, 30–40 (2015)
Kim, D., Elek, P., Mirjana, R.: Mapping Urbanities: Morphologies, Flows, Possibilities. Routledge, Abingdon (2017)
Kuby, M., Barranda, A., Upchurch, C.: Factors influencing light-rail station boardings in the United States. Transp. Res. Part A Policy Pract. 38(3), 223–247 (2004)
Li, J., Yao, M., Fu, Q.: Forecasting method for urban rail transit ridership at station level using back propagation neural network. Discrete Dyn. Nat. Soc. (2016). https://doi.org/10.1155/2016/9527584
Liu, C., Erdogan, S., Ma, T., Ducca, F.W.: How to increase rail ridership in maryland: direct ridership models for policy guidance. J. Urban Plan. Dev. 142(4), 04016017 (2016)
Loo, B.P., Chen, C., Chan, E.T.: Rail-based transit-oriented development: lessons from New York city and Hong Kong. Landsc. Urban Plan. 97(3), 202–212 (2010)
Lu, B., Harris, P., Charlton, M., Brunsdon, C., Nakaya, T., Murakami, D., Gollini, I.: GWmodel: Geographically-Weighted Models (2018). https://CRAN.R-project.org/package=GWmodel, r package version 2.0-6
Marshall, N., Grady, B.: Sketch transit modeling based on 2000 census data. Transp. Res. Rec. J. Transp. Res. Board 1986, 182–189 (2006)
McNally, M.G.: The four step model (2000)
Moran, P.A.: Notes on continuous stochastic phenomena. Biometrika 37(1/2), 17–23 (1950)
Nakaya, T., Fotheringham, A.S., Brunsdon, C., Charlton, M.: Geographically weighted Poisson regression for disease association mapping. Stat. Med. 24(17), 2695–2717 (2005)
Pan, H., Li, J., Shen, Q., Shi, C.: What determines rail transit passenger volume? Implications for transit oriented development planning. Transp. Res. Part D Transp. Environ. 57, 52–63 (2017)
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2009). http://www.R-project.org, ISBN 3-900051-07-0
Singhal, A., Kamga, C., Yazici, A.: Impact of weather on urban transit ridership. Transp. Res. Part A Policy Pract. 69, 379–391 (2014)
Sohn, K., Shim, H.: Factors generating boardings at metro stations in the seoul metropolitan area. Cities 27(5), 358–368 (2010)
Sung, H., Oh, J.T.: Transit-oriented development in a high-density city: Identifying its association with transit ridership in Seoul, Korea. Cities 28(1), 70–82 (2011)
Taylor, B.D., Miller, D., Iseki, H., Fink, C.: Analyzing the determinants of transit ridership using a two-stage least squares regression on a national sample of urbanized areas (2003)
Thompson, G., Brown, J., Bhattacharya, T.: What really matters for increasing transit ridership: understanding the determinants of transit ridership demand in Broward County, Florida. Urban Stud. 49(15), 3327–3345 (2012)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996)
Trevor Hastie, B.E.: LARS: Least Angle Regression, Lasso and Forward Stagewise. https://CRAN.R-project.org/package=lars, r package version 1.2 (2013)
Walters, G., Cervero, R.: Forecasting Transit Demand in a Fast Growing Corridor: The Direct-Ridership Model Approach. Fehrs and Peers Associates, Oakland (2003)
Wheeler, D.C.: Diagnostic tools and a remedial method for collinearity in geographically weighted regression. Environ. Plan. A 39(10), 2464–2481 (2007)
Wheeler, D.C.: Simultaneous coefficient penalization and model selection in geographically weighted regression: the geographically weighted lasso. Environ. Plan. A 41(3), 722–742 (2009)
Wheeler, D.: GWRR: fits geographically weighted regression models with diagnostic tools. https://CRAN.R-project.org/package=gwrr, r package version 0.2-1 (2013)
Wheeler, D., Tiefelsdorf, M.: Multicollinearity and correlation among local regression coefficients in geographically weighted regression. J. Geogr. Syst. 7(2), 161–187 (2005)
Zhang, D., Wang, X.C.: Transit ridership estimation with network kriging: a case study of Second Avenue Subway, NYC. J. Transp. Geogr. 41, 107–115 (2014)
Zhao, J., Deng, W., Song, Y., Zhu, Y.: What influences metro station ridership in China? Insights from Nanjing. Cities 35, 114–124 (2013)
Acknowledgements
This work was partially supported by the National Natural Science Foundation of China (No. 71901188), and the Research Grants Council Theme- based Research Scheme (No. T32-101/15-R). The authors would like to thank Professor Jian Ma from Southwest Jiaotong University for providing the Shenzhen metro AFC data.
Author information
Authors and Affiliations
Contributions
YH: Original idea, Literature Search and Review, Data Collection and Analysis, Manuscript Writing. YZ: Modelling, Content planning, Data Analysis, Manuscript Editing. KLT: Content planning, Manuscript editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Summary of literature review on direct demand models for ridership prediction
The related studies on direct demand models for ridership prediction are summarized as following (Figs. 15 and 16).
Definitions of stations’ identifiers
For the sake of convenience in representing and understanding, we use alphanumeric code instead of Chinese to denote each station name. Here, we define identifiers for station names according to the following rules: (1) non-transfer stations consist of 3 digits, where the first digit denotes the line number, and the rest 2 digits denote the sequential number of station; (2) transfer stations start with character t followed by 3 digits, where the first 2 digits denote the intersection of two lines, and the last digit means the sequential number of intersections between those two lines. For example, “402” represents the 2nd station of Line 4, and “t131” represents the transfer station that is the first intersection of lines 1 and 3. In this way, all of 118 stations can be encoded by such identifiers containing the line and station information literally (Figs. 11 and 13).
Rights and permissions
About this article
Cite this article
He, Y., Zhao, Y. & Tsui, K.L. An adapted geographically weighted LASSO (Ada-GWL) model for predicting subway ridership. Transportation 48, 1185–1216 (2021). https://doi.org/10.1007/s11116-020-10091-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11116-020-10091-2