当前位置: X-MOL 学术Adv. Data Anal. Classif. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ensemble of optimal trees, random forest and random projection ensemble classification
Advances in Data Analysis and Classification ( IF 1.6 ) Pub Date : 2019-06-12 , DOI: 10.1007/s11634-019-00364-9
Zardad Khan , Asma Gul , Aris Perperoglou , Miftahuddin Miftahuddin , Osama Mahmoud , Werner Adler , Berthold Lausen

The predictive performance of a random forest ensemble is highly associated with the strength of individual trees and their diversity. Ensemble of a small number of accurate and diverse trees, if prediction accuracy is not compromised, will also reduce computational burden. We investigate the idea of integrating trees that are accurate and diverse. For this purpose, we utilize out-of-bag observations as a validation sample from the training bootstrap samples, to choose the best trees based on their individual performance and then assess these trees for diversity using the Brier score on an independent validation sample. Starting from the first best tree, a tree is selected for the final ensemble if its addition to the forest reduces error of the trees that have already been added. Our approach does not use an implicit dimension reduction for each tree as random project ensemble classification. A total of 35 bench mark problems on classification and regression are used to assess the performance of the proposed method and compare it with random forest, random projection ensemble, node harvest, support vector machine, kNN and classification and regression tree. We compute unexplained variances or classification error rates for all the methods on the corresponding data sets. Our experiments reveal that the size of the ensemble is reduced significantly and better results are obtained in most of the cases. Results of a simulation study are also given where four tree style scenarios are considered to generate data sets with several structures.

中文翻译:

最优树,随机森林和随机投影集合分类的集合

随机森林合奏的预测性能与单个树木的强度及其多样性高度相关。如果不影响预测准确性,则由少量准确而多样的树组成的集合也将减少计算负担。我们研究了整合准确且多样化的树的想法。为此,我们利用袋外观察作为训练自举样本的验证样本,根据其个体表现选择最佳树木,然后使用独立验证样本上的Brier分数评估这些树木的多样性。从第一棵最好的树开始,如果将其添加到林中减少了已经添加的树的错误,则从最后一棵树中选择一棵树。我们的方法没有为每棵树使用隐式降维作为随机项目集成分类。共有35个关于分类和回归的基准问题用于评估该方法的性能,并将其与随机森林,随机投影集合,节点收获,支持向量机,k NN和分类回归树。我们为相应数据集上的所有方法计算无法解释的方差或分类错误率。我们的实验表明,在大多数情况下,合奏的大小显着减小,并且可以获得更好的结果。还给出了仿真研究的结果,其中考虑了四个树样式方案以生成具有几种结构的数据集。
更新日期:2019-06-12
down
wechat
bug