Incorporating travel behavior regularity into passenger flow forecasting

https://doi.org/10.1016/j.trc.2021.103200Get rights and content

Highlights

  • Traditional time series forecasting ignores the causal structure in passenger demand.

  • We introduce a novel RPP concept to encode such long-range dependency.

  • We demonstrate the effectiveness of this approach using real-world data.

  • The single covariate greatly enhances forecasting accuracy for most stations.

Abstract

Accurate forecasting of passenger flow (i.e., ridership) is critical to the operation of urban metro systems. Previous studies mainly model passenger flow as time series by aggregating individual trips and then perform forecasting based on the values in the past several steps. However, this approach essentially overlooks the fact that passenger flow consists of trips from each individual traveler. For example, a traveler’s work trip in the morning can help predict his/her home trip in the evening, while this causal structure cannot be explicitly encoded in standard time series models. In this paper, we propose a new forecasting framework for boarding flow by incorporating the generative mechanism into standard time series models and leveraging the strong regularity rooted in travel behavior. In doing so, we introduce returning flow from previous alighting trips as a new covariate, which captures the causal structure and long-range dependencies in passenger flow data based on travel behavior. We develop the return probability parallelogram (RPP) to summarize the causal relationships and estimate the return flow. The proposed framework is evaluated using real-world passenger flow data, and the results confirm that the returning flow—a single covariate—can substantially and consistently improve various forecasting tasks, including one-step ahead forecasting, multi-step ahead forecasting, and forecasting under special events. And the proposed method is more effective for business-type stations with most passengers come and return within the same day. This study can be extended to other modes of transport, and it also sheds new light on general demand time series forecasting problems, in which causal structure and long-range dependencies are generated by the user behavior.

Introduction

Recent years have witnessed the rapid development of metro systems and the continued growth of metro ridership worldwide (UITP, 2018). As an efficient and high-capacity transportation mode, the metro is playing an ever-important role in shaping future sustainable transportation. Given the growing importance of metro systems, it is critical to have a good understanding of passenger demand patterns to support service operation. A key task is to make accurate and real-time forecasting of passenger demand/ridership, which plays a vital role in a wide range of applications, including service scheduling, crowd management, and disruption response, to name but a few.

Short-term passenger flow forecasting typically focuses on forecasting the passenger flow in the next few minutes to several hours, and has been extensively studied in public transportation research. Most existing studies formulate passenger flow data as time series and follow similar methods as those applied in traffic flow forecasting. For example, statistical time series models have been widely applied to ridership forecasting problems, including auto-regressive integrated moving average (ARIMA) (Williams and Hoel, 2003, Ding et al., 2017, Chen et al., 2020a), exponential smoothing (Tan et al., 2009), and state-space/Kalman filter (Stathopoulos and Karlaftis, 2003, Jiao et al., 2016). Most of these classical time series models are linear by nature; to better characterize the non-linearity in time series data, non-linear versions or ensemble extensions of these models have also been studied (e.g., Jiao et al., 2016, Carrese et al., 2017). Recent research starts regarding the forecasting a supervised machine learning problem. On this track, some representative supervised learning models have been applied, such as support vector machine (SVM) (Chen et al., 2011, Sun et al., 2015), artificial neural network (ANN) (Vlahogianni et al., 2005, Tsai et al., 2009, Li et al., 2017), random forest (Toqué et al., 2017), and recurrent neural network (RNN)/long short-term memory (LSTM) as emerging deep learning approaches (Hao et al., 2019, Liu et al., 2019). The aforementioned research mainly focuses on modeling a univariate time series for a single metro station. However, the metro system is a network in which stations exhibit strong spatial and temporal correlations/dependencies. To extend the univariate analysis to network-wide passenger flow forecasting, some state-of-the-art models have been proposed to better characterize the complex spatiotemporal patterns and dynamics. For example, Gong et al. (2020) proposed matrix factorization models to estimate passenger flow data for each origin–destination (OD) pair; Li et al. (2019) introduced local smoothness prior based on auxiliary information (e.g., flow correlation, network typology, and POI composition) into tensor completion models to forecast passenger flow; Chen et al. (2020b) developed graph convolutional network (GCN) models to capture the complex spatiotemporal dependencies in a metro network. These new machine learning-based models have shown superior performance over traditional time series models, and they are more effective in capturing the complex patterns by incorporating domain knowledge and external features such as weather, event, time of day, and day of week.

In all the studies mentioned above, passenger flow data is generally modeled as an aggregated count time series obtained by counting the number of unique card IDs in smart card transactions. Despite the simplicity and effectiveness of these models, we would argue that the most important characteristic of passenger flow is overlooked due to the aggregation: passenger flow consists of the movement of individuals with strong regularity rooted in their travel behavior. For instance, if a passenger alights at a metro station for work in the morning, he/she will probably depart at the same station when he/she goes home in the evening. If he/she does not travel in the morning, it becomes less likely we will observe a corresponding return trip. This example clearly shows that past trips should be utilized to predict future demand, and individual travel behavior actually can result in causal structure and long-range dependencies in passenger flow time series data. Some recent studies have shown that travel behavior plays a substantial role in dynamic traffic assignment (Cantelmo and Viti, 2019) and online demand estimation (Cantelmo et al., 2020). This effect is particularly true for metro systems where passengers’ travel patterns are highly regular (Sun et al., 2013, Goulet-Langlois et al., 2017, Zhao et al., 2018b). Therefore, when developing a passenger flow forecasting model, it is essential to integrate this type of behavior-driven and long-range dependencies in addition to the local input (e.g., the past n steps in the time series).

The goal of this study is to explore the potential of incorporating an additional travel behavior component into the forecasting of passenger flow time series. Specifically, we propose a new scheme to forecast boarding/incoming passenger demand at a station by integrating historical alighting time series at the same station. We define returning passengers as those who finish their first trip at station s and also start their second trip at the same station. In other words, returning passengers refer to the individuals who stay at station s to perform an activity (e.g., home and work). In general, these return trips are not random and often exhibit strong regularity due to the activities performed. This motivates us to forecast the incoming/boarding demand from these “returning passengers” using the information on their previous trips. To achieve this, we introduce a new concept of return probability parallelogram (RPP) to better estimate returning flow, and we find that the estimated returning flow highly correlates with the overall boarding demand in a real-world data set. To further quantify the benefits of incorporating this returning flow measure, we evaluate the proposed models for one-step ahead forecasting, multi-step ahead forecasting, and forecasting under special events. Our results show that incorporating returning flow as an additional variable will consistently improve the accuracy of forecasting.

The idea of leveraging trip-level information has been introduced and examined in some recent studies, which predict the alighting flow of a station using the recent boarding flow from other related stations (see e.g., Li et al., 2017, Hao et al., 2019, Liu et al., 2019). However, the large number of boarding-alighting station pairs makes it difficult to learn an informative model at a trip level, and eventually these studies develop deep neural networks to learn the correlation from the aggregated count data in a purely data-driven approach. Our model, instead, uses the alighting of “this trip” to predict the boarding of the “next trip”, where the alighting and the boarding stations are usually the same (Barry et al., 2002, Trépanier et al., 2007). We examine this idea on a boarding flow forecasting application, which is more important to service operation and planning. The “returning flow” proposed in this paper is solely based on the intrinsic travel regularity of travelers and it does not require external information/knowledge. Our work is closely related to Zhao et al. (2018b), which proposes a probabilistic model to predict the next trip for an individual based on his/her trip history. However, instead of predicting individual trips, our primary goal is to forecast the overall passenger flow to support the decision making in service operation. In doing so, we estimate the returning flow in an aggregated approach; therefore, the framework does not require individual-based data sets that are confidential and sensitive for privacy reasons. The main contribution of this work is summarized as follows.

  • We define returning flow to characterize the causal structure and long-range dependencies in passenger flow data, which are essentially overlooked in previous time series-based studies.

  • We integrate returning flow as an additional covariate into standard time series models, and the proposed behavior-integrated model shows consistently improved performance in our case studies based on a real-world data set.

  • Our model also provides a new approach to forecast passenger flows under special events.

To the best of our knowledge, this is the first research that incorporates a travel behavior component into the longstanding passenger flow forecasting problem. The remainder of the paper is organized as follows. Section 2 introduces the concept of returning flow and return probability parallelogram as the tool to integrate travel behavior regularity into the passenger flow forecasting framework. In Section 3, we develop case studies based on real-world smart card data and demonstrate the effectiveness of the proposed models in different scenarios. Finally, Section 4 concludes our research and discusses future work.

Section snippets

Methodology

In this section, we introduce returning flow and the return probability parallelogram as two fundamental building blocks in the behavior-based boarding flow forecasting framework. The proposed forecasting models are constructed by integrating returning flow as a new feature/covariate into traditional time series forecasting models. We start with a brief description of the passenger flow forecasting problem.

Experiments

In this section, we conduct numerical experiments to evaluate the effectiveness of the proposed behavior-integrated models. We choose the standard SARIMA model as the core model for time series forecasting (M0). On top of this model, we create two regression with SARIMA error models—M1 and M2—by simply incorporating the observed rts and the estimated r̂t+1s as additional covariates, respectively. We evaluate the performance of these models in three scenarios: 1) one-step ahead forecasting, 2)

Conclusions and discussion

In this paper, we propose a new framework for forecasting passenger flow time series in metro systems. In contrast to some previous studies that capture temporal dynamics in a data-driven way, we try to incorporate the generative mechanisms rooted in travel behavior into modeling passenger flow time series. For that purpose, we introduce returning flow as a new covariate/feature into standard time series models. This returning flow is estimated as the expected returning boarding demand given

Acknowledgement

This research is supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada, Mitacs, exo.quebec (https://exo.quebec/en), NSFC-FRQSC Research Program on Smart Cities and Big Data, the Institute for Data Valorisation (IVADO), and the Canada Foundation for Innovation (CFI).

References (36)

  • T.H. Tsai et al.

    Neural network based temporal feature models for short-term railway passenger demand forecasting

    Expert Syst. Appl.

    (2009)
  • E.I. Vlahogianni et al.

    Optimized and meta-optimized neural networks for short-term traffic flow prediction: A genetic approach

    Transportation Research Part C: Emerging Technologies

    (2005)
  • Z. Zhao et al.

    Detecting pattern changes in individual travel behavior: A bayesian approach

    Transportation research part B: methodological

    (2018)
  • Z. Zhao et al.

    Individual mobility prediction using transit smart card data

    Transportation Research Part C: Emerging Technologies

    (2018)
  • H. Akaike

    Information theory and an extension of the maximum likelihood principle

    Selected papers of hirotugu akaike. Springer

    (1998)
  • J.J. Barry et al.

    Origin and destination estimation in new york city with automated fare system data

    Transp. Res. Rec.

    (2002)
  • E. Chen et al.

    Subway passenger flow prediction for special events using smart card data

    IEEE Trans. Intell. Transp. Syst.

    (2020)
  • Chen, J., Liu, L., Wu, H., Zhen, J., Li, G., Lin, L., 2020b. Physical-virtual collaboration graph network for...
  • Cited by (0)

    View full text