AMF: Aggregated Mondrian forests for online learning,The Journal of the Royal Statistical Society, Series B (Statistical Methodology)

当前位置： X-MOL 学术 › J. R. Stat. Soc. B › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

AMF: Aggregated Mondrian forests for online learning
The Journal of the Royal Statistical Society, Series B (Statistical Methodology) ( IF 5.8 ) Pub Date : 2021-05-19 , DOI: 10.1111/rssb.12425
Jaouad Mourtada ₁ , Stéphane Gaïffas _{2,

3} , Erwan Scornet ₁

Affiliation

Random forest (RF) is one of the algorithms of choice in many supervised learning applications, be it classification or regression. The appeal of such tree-ensemble methods comes from a combination of several characteristics: a remarkable accuracy in a variety of tasks, a small number of parameters to tune, robustness with respect to features scaling, a reasonable computational cost for training and prediction, and their suitability in high-dimensional settings. The most commonly used RF variants, however, are ‘offline’ algorithms, which require the availability of the whole dataset at once. In this paper, we introduce AMF, an online RF algorithm based on Mondrian Forests. Using a variant of the context tree weighting algorithm, we show that it is possible to efficiently perform an exact aggregation over all prunings of the trees; in particular, this enables to obtain a truly online parameter-free algorithm which is competitive with the optimal pruning of the Mondrian tree, and thus adaptive to the unknown regularity of the regression function. Numerical experiments show that AMF is competitive with respect to several strong baselines on a large number of datasets for multi-class classification.

中文翻译：

AMF：用于在线学习的聚合蒙德里安森林

随机森林 (RF) 是许多监督学习应用程序的首选算法之一，无论是分类还是回归。这种树集成方法的吸引力来自于几个特征的组合：在各种任务中具有显着的准确性，需要调整的参数数量很少，特征缩放方面的鲁棒性，训练和预测的合理计算成本，以及它们在高维环境中的适用性。然而，最常用的 RF 变体是“离线”算法，它需要立即提供整个数据集。在本文中，我们介绍了 AMF，一种基于蒙德里安森林的在线 RF 算法。使用上下文树加权算法的变体，我们表明可以对树的所有修剪有效地执行精确聚合；特别是，这使得能够获得真正的在线无参数算法，该算法与蒙德里安树的最佳剪枝竞争，从而适应回归函数的未知规律。数值实验表明，AMF 在用于多类分类的大量数据集上的几个强基线方面具有竞争力。

更新日期：2021-05-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>