A note on the interpretation of tree‐based regression models,Biometrical Journal

当前位置： X-MOL 学术 › Biom. J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A note on the interpretation of tree‐based regression models
Biometrical Journal ( IF 1.3 ) Pub Date : 2020-05-25 , DOI: 10.1002/bimj.201900195
Anna Gottard _{1,

2} , Giulia Vannucci ₁ , Giovanni Maria Marchetti _{1,

2}

Affiliation

Tree-based models are a popular tool for predicting a response given a set of explanatory variables when the regression function is characterized by a certain degree of complexity. Sometimes, they are also used to identify important variables and for variable selection. We show that if the generating model contains chains of direct and indirect effects, then the typical variable importance measures suggest selecting as important mainly the background variables, which have a strong indirect effect, disregarding the variables that directly influence the response. This is attributable mainly to the variable choice in the first steps of the algorithm selecting the splitting variable and to the greedy nature of such search. This pitfall could be relevant when using tree-based algorithms for understanding the underlying generating process, for population segmentation and for causal inference.

中文翻译：

基于树的回归模型的解释说明

当回归函数具有一定程度的复杂性时，基于树的模型是一种流行的工具，用于预测给定一组解释变量的响应。有时，它们也用于识别重要变量和变量选择。我们表明，如果生成模型包含直接和间接影响链，那么典型的变量重要性度量建议主要选择具有强烈间接影响的背景变量作为重要变量，而忽略直接影响响应的变量。这主要归因于算法选择分裂变量的第一步中的变量选择以及这种搜索的贪婪性质。当使用基于树的算法来理解底层生成过程时，这个陷阱可能是相关的，

更新日期：2020-05-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11