当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ensembles of extremely randomized predictive clustering trees for predicting structured outputs
Machine Learning ( IF 4.3 ) Pub Date : 2020-08-17 , DOI: 10.1007/s10994-020-05894-4
Dragi Kocev , Michelangelo Ceci , Tomaž Stepišnik

We address the task of learning ensembles of predictive models for structured output prediction (SOP). We focus on three SOP tasks: multi-target regression (MTR), multi-label classification (MLC) and hierarchical multi-label classification (HMC). In contrast to standard classification and regression, where the output is a single (discrete or continuous) variable, in SOP the output is a data structure—a tuple of continuous variables MTR, a tuple of binary variables MLC or a tuple of binary variables with hierarchical dependencies (HMC). SOP is gaining increasing interest in the research community due to its applicability in a variety of practically relevant domains. In this context, we consider the Extra-Tree ensemble learning method—the overall top performer in the DREAM4 and DREAM5 challenges for gene network reconstruction. We extend this method for SOP tasks and call the extension Extra-PCTs ensembles. As base predictive models we propose using predictive clustering trees (PCTs)–a generalization of decision trees for predicting structured outputs. We conduct a comprehensive experimental evaluation of the proposed method on a collection of 41 benchmark datasets: 21 for MTR, 10 for MLC and 10 for HMC. We first investigate the influence of the size of the ensemble and the size of the feature subset considered at each node. We then compare the performance of Extra-PCTs to other ensemble methods (random forests and bagging), as well as to single PCTs. The experimental evaluation reveals that the Extra-PCTs achieve optimal performance in terms of predictive power and computational cost, with 50 base predictive models across the three tasks. The recommended values for feature subset sizes vary across the tasks, and also depend on whether the dataset contains only binary and/or sparse attributes. The Extra-PCTs give better predictive performance than a single tree (the differences are typically statistically significant). Moreover, the Extra-PCTs are the best performing ensemble method (except for the MLC task, where performances are similar to those of random forests), and Extra-PCTs can be used to learn good feature rankings for all of the tasks considered here.

中文翻译:

用于预测结构化输出的极其随机的预测聚类树的集合

我们解决了学习结构化输出预测(SOP)的预测模型集合的任务。我们专注于三个 SOP 任务:多目标回归 (MTR)、多标签分类 (MLC) 和分层多标签分类 (HMC)。与标准分类和回归相比,输出是单个(离散或连续)变量,在 SOP 中,输出是数据结构——连续变量元组 MTR、二元变量元组 MLC 或二元变量元组分层依赖关系 (HMC)。由于 SOP 在各种实际相关领域的适用性,SOP 越来越受到研究界的关注。在这种情况下,我们考虑了 Extra-Tree 集成学习方法——在基因网络重建的 DREAM4 和 DREAM5 挑战中整体表现最佳。我们为 SOP 任务扩展了这种方法,并将扩展称为 Extra-PCTs 集成。作为基础预测模型,我们建议使用预测聚类树(PCT)——一种用于预测结构化输出的决策树的泛化。我们在 41 个基准数据集的集合上对所提出的方法进行了全面的实验评估:21 个用于 MTR,10 个用于 MLC,10 个用于 HMC。我们首先研究集成大小和每个节点考虑的特征子集大小的影响。然后,我们将 Extra-PCT 的性能与其他集成方法(随机森林和装袋)以及单个 PCT 的性能进行比较。实验评估表明,Extra-PCT 在预测能力和计算成本方面实现了最佳性能,在三个任务中使用了 50 个基本预测模型。特征子集大小的推荐值因任务而异,还取决于数据集是否仅包含二进制和/或稀疏属性。Extra-PCT 提供比单个树更好的预测性能(差异通常具有统计学意义)。此外,Extra-PCTs 是性能最好的集成方法(MLC 任务除外,其性能类似于随机森林),并且 Extra-PCTs 可用于学习此处考虑的所有任务的良好特征排名。
更新日期:2020-08-17
down
wechat
bug