当前位置: X-MOL 学术Distrib. Parallel. Databases › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ForeXGBoost: passenger car sales prediction based on XGBoost
Distributed and Parallel Databases ( IF 1.5 ) Pub Date : 2020-05-25 , DOI: 10.1007/s10619-020-07294-y
Zhenchang Xia , Shan Xue , Libing Wu , Jiaxin Sun , Yanjiao Chen , Rui Zhang

The rapid development of machine learning has spurred wide applications to various industries, where prediction models are built to forecast sales to help enterprises and governments make better plans. Alibaba Cloud and the Yancheng Municipal Government held a competition in 2018, calling for global efforts to build machine learning models that can accurately forecast vehicle sales based on large-scale datasets. This paper presents the design, implementation and evaluation of ForeXGBoost, and our proposed model that won the first place in the competition. ForeXGBoost takes full advantage of carefully-designed data filling algorithms for missing values to improve data quality. By using the sliding window to extract historical sales and production data features, ForeXGBoost can improve prediction accuracy. An extensive study is conducted to evaluate the influence of different attributes on vehicle sales via information gain and data correlation, based on which we select the most indicative features from the feature set for prediction. Furthermore, we leverage the XGBoost prediction algorithm to achieve a high prediction accuracy with short running time for vehicle sales prediction. Extensive experiments confirm that ForeXGBoost can achieve a high prediction accuracy with a low overhead.

中文翻译:

ForeXGBoost:基于XGBoost的乘用车销量预测

机器学习的快速发展刺激了各个行业的广泛应用,其中建立了预测模型来预测销售,以帮助企业和政府制定更好的计划。2018 年,阿里云与盐城市政府举办了一场竞赛,呼吁全球共同努力构建能够基于大规模数据集准确预测汽车销量的机器学习模型。本文介绍了 ForeXGBoost 的设计、实现和评估,以及我们提出的在比赛中获得第一名的模型。ForeXGBoost 充分利用精心设计的缺失值数据填充算法来提高数据质量。通过使用滑动窗口提取历史销售和生产数据特征,ForeXGBoost 可以提高预测精度。进行了广泛的研究,通过信息增益和数据相关性来评估不同属性对车辆销售的影响,在此基础上,我们从特征集中选择最具指示性的特征进行预测。此外,我们利用 XGBoost 预测算法在车辆销售预测的运行时间短的情况下实现高预测精度。大量实验证实,ForeXGBoost 可以以低开销实现高预测精度。
更新日期:2020-05-25
down
wechat
bug