当前位置: X-MOL 学术Artif. Intell. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A comparative analysis of gradient boosting algorithms
Artificial Intelligence Review ( IF 10.7 ) Pub Date : 2020-08-24 , DOI: 10.1007/s10462-020-09896-5
Candice Bentéjac , Anna Csörgő , Gonzalo Martínez-Muñoz

The family of gradient boosting algorithms has been recently extended with several interesting proposals (i.e. XGBoost, LightGBM and CatBoost) that focus on both speed and accuracy. XGBoost is a scalable ensemble technique that has demonstrated to be a reliable and efficient machine learning challenge solver. LightGBM is an accurate model focused on providing extremely fast training performance using selective sampling of high gradient instances. CatBoost modifies the computation of gradients to avoid the prediction shift in order to improve the accuracy of the model. This work proposes a practical analysis of how these novel variants of gradient boosting work in terms of training speed, generalization performance and hyper-parameter setup. In addition, a comprehensive comparison between XGBoost, LightGBM, CatBoost, random forests and gradient boosting has been performed using carefully tuned models as well as using their default settings. The results of this comparison indicate that CatBoost obtains the best results in generalization accuracy and AUC in the studied datasets although the differences are small. LightGBM is the fastest of all methods but not the most accurate. Finally, XGBoost places second both in accuracy and in training speed. Finally an extensive analysis of the effect of hyper-parameter tuning in XGBoost, LightGBM and CatBoost is carried out using two novel proposed tools.

中文翻译:

梯度提升算法对比分析

梯度提升算法系列最近扩展了几个有趣的提议(即 XGBoost、LightGBM 和 CatBoost),它们专注于速度和准确性。XGBoost 是一种可扩展的集成技术,已被证明是一种可靠且高效的机器学习挑战解决方案。LightGBM 是一个精确的模型,专注于使用高梯度实例的选择性采样提供极快的训练性能。CatBoost 修改了梯度的计算以避免预测偏移,从而提高模型的准确性。这项工作提出了对梯度提升的这些新变体如何在训练速度、泛化性能和超参数设置方面工作的实用分析。另外,综合对比XGBoost、LightGBM、CatBoost,随机森林和梯度提升已使用仔细调整的模型及其默认设置执行。该比较的结果表明,尽管差异很小,但 CatBoost 在所研究的数据集中在泛化精度和 AUC 方面获得了最好的结果。LightGBM 是所有方法中最快的,但不是最准确的。最后,XGBoost 在准确性和训练速度方面均排名第二。最后,使用两个新的建议工具对 XGBoost、LightGBM 和 CatBoost 中超参数调整的影响进行了广泛分析。LightGBM 是所有方法中最快的,但不是最准确的。最后,XGBoost 在准确性和训练速度方面均排名第二。最后,使用两个新的建议工具对 XGBoost、LightGBM 和 CatBoost 中超参数调整的影响进行了广泛分析。LightGBM 是所有方法中最快的,但不是最准确的。最后,XGBoost 在准确性和训练速度方面均排名第二。最后,使用两个新的建议工具对 XGBoost、LightGBM 和 CatBoost 中超参数调整的影响进行了广泛分析。
更新日期:2020-08-24
down
wechat
bug