Predictive Distribution Modeling Using Transformation Forests,Journal of Computational and Graphical Statistics

当前位置： X-MOL 学术 › J. Comput. Graph. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Predictive Distribution Modeling Using Transformation Forests
Journal of Computational and Graphical Statistics ( IF 2.4 ) Pub Date : 2021-03-08 , DOI: 10.1080/10618600.2021.1872581
Torsten Hothorn ₁ , Achim Zeileis ₂

Affiliation

Abstract

Regression models for supervised learning problems with a continuous response are commonly understood as models for the conditional mean of the response given predictors. This notion is simple and therefore appealing for interpretation and visualization. Information about the whole underlying conditional distribution is, however, not available from these models. A more general understanding of regression models as models for conditional distributions allows much broader inference, for example, the computation of prediction intervals or probabilistic predictions for exceeding certain thresholds. Several random forest-type algorithms aim at estimating conditional distributions, most prominently quantile regression forests. We propose a novel approach based on a parametric family of distributions characterized by their transformation function. A dedicated novel “transformation tree” algorithm able to detect distributional changes is developed. Based on these transformation trees, we introduce “transformation forests” as an adaptive local likelihood estimator of conditional distribution functions. The resulting predictive distributions are fully parametric yet very general and allow inference procedures, such as likelihood-based variable importances, to be applied in a straightforward way. Supplemental files for this article are available online.

中文翻译：

使用转换森林的预测分布建模

摘要

具有连续响应的监督学习问题的回归模型通常被理解为响应给定预测变量的条件均值的模型。这个概念很简单，因此对解释和可视化很有吸引力。但是，无法从这些模型中获得有关整个基础条件分布的信息。对回归模型作为条件分布模型的更一般理解允许更广泛的推理，例如，计算预测区间或超过特定阈值的概率预测。几种随机森林类型的算法旨在估计条件分布，最突出的是分位数回归森林。我们提出了一种基于以变换函数为特征的参数分布族的新方法。开发了一种能够检测分布变化的专用新型“转换树”算法。基于这些转换树，我们引入了“转换森林”作为条件分布函数的自适应局部似然估计器。由此产生的预测分布是完全参数化的，但非常通用，并允许以直接的方式应用推理过程，例如基于似然的变量重要性。本文的补充文件可在线获取。由此产生的预测分布是完全参数化的，但非常通用，并允许以直接的方式应用推理过程，例如基于似然的变量重要性。本文的补充文件可在线获取。由此产生的预测分布是完全参数化的，但非常通用，并允许以直接的方式应用推理过程，例如基于似然的变量重要性。本文的补充文件可在线获取。

更新日期：2021-03-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>