当前位置: X-MOL 学术IET Intell. Transp. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Seoul bike trip duration prediction using data mining techniques
IET Intelligent Transport Systems ( IF 2.7 ) Pub Date : 2020-11-02 , DOI: 10.1049/iet-its.2019.0796
Sathishkumar V E 1 , Jangwoo Park 1 , Yongyun Cho 1
Affiliation  

Trip duration is the most fundamental measure in all modes of transportation. Hence, it is crucial to predict the trip-time precisely for the advancement of Intelligent Transport Systems and traveller information systems. To predict the trip duration, data mining techniques are employed in this study to predict the trip duration of rental bikes in Seoul Bike sharing system. The prediction is carried out with the combination of Seoul Bike data and weather data. The data used include trip duration, trip distance, pickup and dropoff latitude and longitude, temperature, precipitation, wind speed, humidity, solar radiation, snowfall, ground temperature and 1-hour average dust concentration. Feature engineering is done to extract additional features from the data. Four statistical models are used to predict the trip duration. (a) Linear regression, (b) Gradient boosting machines, (c) k nearest neighbour and (d) Random Forest (RF). Four performance metrics root mean squared error, coefficient of variance, mean absolute error and median absolute error is used to determine the efficiency of the models. In comparison with the other models, the best model RF can explain the variance of 93% in the testing set and 98% ( R 2 ) in the training set. The outcome proves that RF is effective to be employed for the prediction of trip duration.

中文翻译:

使用数据挖掘技术预测首尔自行车旅行持续时间

出行时间是所有运输方式中最基本的措施。因此,准确预测出行时间对于智能交通系统和旅行者信息系统的发展至关重要。为了预测出行时间,本研究中采用了数据挖掘技术来预测首尔自行车共享系统中租赁自行车的出行时间。结合首尔自行车数据和天气数据进行预测。所使用的数据包括行程持续时间,行程距离,上下车纬度和经度,温度,降水,风速,湿度,太阳辐射,降雪,地面温度和1小时平均尘埃浓度。完成要素工程以从数据中提取其他要素。四个统计模型用于预测行程持续时间。(a)线性回归,ķ最近邻居和(d)随机森林(RF)。四个性能指标均方根误差,方差系数,平均绝对误差和中位数绝对误差用于确定模型的效率。与其他模型相比,最佳模型RF可以解释测试集中93%和98%( [R 2 )在训练集中。结果证明,RF有效地用于行程持续时间的预测。
更新日期:2020-11-03
down
wechat
bug