当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A probabilistic classifier ensemble weighting scheme based on cross-validated accuracy estimates
Data Mining and Knowledge Discovery ( IF 4.8 ) Pub Date : 2019-06-17 , DOI: 10.1007/s10618-019-00638-y
James Large , Jason Lines , Anthony Bagnall

Our hypothesis is that building ensembles of small sets of strong classifiers constructed with different learning algorithms is, on average, the best approach to classification for real-world problems. We propose a simple mechanism for building small heterogeneous ensembles based on exponentially weighting the probability estimates of the base classifiers with an estimate of the accuracy formed through cross-validation on the train data. We demonstrate through extensive experimentation that, given the same small set of base classifiers, this method has measurable benefits over commonly used alternative weighting, selection or meta-classifier approaches to heterogeneous ensembles. We also show how an ensemble of five well-known, fast classifiers can produce an ensemble that is not significantly worse than large homogeneous ensembles and tuned individual classifiers on datasets from the UCI archive. We provide evidence that the performance of the cross-validation accuracy weighted probabilistic ensemble (CAWPE) generalises to a completely separate set of datasets, the UCR time series classification archive, and we also demonstrate that our ensemble technique can significantly improve the state-of-the-art classifier for this problem domain. We investigate the performance in more detail, and find that the improvement is most marked in problems with smaller train sets. We perform a sensitivity analysis and an ablation study to demonstrate the robustness of the ensemble and the significant contribution of each design element of the classifier. We conclude that it is, on average, better to ensemble strong classifiers with a weighting scheme rather than perform extensive tuning and that CAWPE is a sensible starting point for combining classifiers.

中文翻译:

基于交叉验证准确性估计的概率分类器集成加权方案

我们的假设是,用不同的学习算法构建的由少量强分类器组成的集合,通常是针对实际问题进行分类的最佳方法。我们提出了一种简单的机制来构建小型异构集合,该机制基于对基础分类器的概率估计值进行加权,并通过对火车数据进行交叉验证形成的准确性估计值进行加权。通过广泛的实验,我们证明,在给定相同数量的基本分类器的情况下,该方法相对于对异类集成的常用替代加权,选择或元分类器方法而言,具有可衡量的优势。我们还展示了五个著名的乐团 快速分类器产生的合奏不会比UCI档案集中的数据集上的大型均匀合奏和已调整的单个分类器差很多。我们提供的证据表明,交叉验证准确性加权概率集合(CAWPE)的性能可以概括为一组完全独立的数据集,即UCR时间序列分类档案,并且我们还证明了我们的集合技术可以显着改善状态的变化。该问题领域的最新分类器。我们对性能进行了更详细的研究,发现这种改进在列车组较小的问题中最为明显。我们进行敏感性分析和消融研究,以证明整体的鲁棒性和分类器每个设计元素的重大贡献。我们得出结论是,
更新日期:2019-06-17
down
wechat
bug