当前位置: X-MOL 学术Transp. Res. Rec. J. Transp. Res. Board › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Novel Three-Stage Framework for Prioritizing and Selecting Feature Variables for Short-Term Metro Passenger Flow Prediction
Transportation Research Record: Journal of the Transportation Research Board ( IF 1.6 ) Pub Date : 2020-07-06 , DOI: 10.1177/0361198120926504
Yangyang Zhao 1 , Lu Ren 2 , Zhenliang Ma 2 , Xinguo Jiang 1
Affiliation  

Abstract

Short-term metro passenger flow prediction is vital for the operation and management of metro systems. Most studies focus on the higher prediction accuracy with statistical and machine learning methods, but little attention has been paid to the prioritization and selection of feature variables, especially for different metro station types. This study aims to analyze the effect of feature variables on the prediction results, and then select appropriate predictor variables accordingly. A novel three-stage framework is proposed to prioritize feature variables for short-term metro passenger flow prediction, including station clustering, feature extraction, and variable prioritization. A hierarchical clustering algorithm (AHC) is developed for station clustering, the results of which are verified by the K-means and Davies-Bouldin (DB) statistical index. We then extract the temporal, spatial, and external features. Finally, the association between the variables and the prediction results is explored using tree-based models. The proposed framework is demonstrated and validated with data collected from Shanghai Metro Automatic Fare Collection (AFC) system. The results highlight that the importance of feature variables for developing models varies between stations, whereas only a few variables are found to explain most of the variation in the testing dataset; different feature variables lead to distinct differences in prediction accuracy, and simply adding more predictor variables does not necessarily lead to higher prediction accuracy. In addition, the station type and prediction type (i.e., tap-in and tap-out) have little influence on the selection of feature variables.



中文翻译:

新颖的三阶段框架,用于为短期地铁客流预测确定优先级并选择特征变量

摘要

短期地铁乘客流量预测对于地铁系统的运营和管理至关重要。大多数研究集中在通过统计和机器学习方法提高预测精度上,但是很少关注特征变量的优先级和选择,尤其是对于不同类型的地铁站。本研究旨在分析特征变量对预测结果的影响,然后相应地选择适当的预测变量。提出了一种新颖的三阶段框架,以对用于短期地铁乘客流量预测的特征变量进行优先级排序,包括车站聚类,特征提取和变量优先级排序。开发了用于站点聚类的分层聚类算法(AHC),其结果已通过K-means和Davies-Bouldin(DB)统计索引进行了验证。然后,我们提取时间,空间和外部特征。最后,使用基于树的模型探索变量与预测结果之间的关联。通过从上海地铁自动收费系统(AFC)系统收集的数据对所提出的框架进行了演示和验证。结果表明,特征变量对于开发模型的重要性在不同站点之间有所不同,而仅发现了少数变量可以解释测试数据集中的大多数变化。不同的特征变量会导致预测准确性出现明显差异,仅添加更多的预测变量并不一定会导致更高的预测准确性。另外,测站类型和预测类型(即输入和输出)对特征变量的选择影响很小。

更新日期:2020-07-07
down
wechat
bug