当前位置: X-MOL 学术Stat. Interface › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Residual-based tree for clustered binary data
Statistics and Its Interface ( IF 0.8 ) Pub Date : 2021-01-01 , DOI: 10.4310/20-sii638
Rong Xia 1 , Christopher R. Friese 2 , Mousumi Banerjee 1
Affiliation  

Tree-based methods are widely used for classification in health sciences research, where data are often clustered. In this paper, we propose a variant of the standard classification and regression tree paradigm (CART) to handle clustered binary outcomes. Using residuals from a null generalized linear mixed model as the response, we build a regression tree to partition the covariate space into rectangles. This circumvents modeling the correlation structure explicitly while still accounting for the cluster-correlated design, thereby allowing us to adopt the standard CART machinery in tree growing, pruning, and cross-validation. Class predictions for each terminal node in the final tree are estimated based on the success probabilities within the specific node. Our method also allows easy extension to ensemble of trees and random forest. Using extensive simulations, we compare our residual-based trees to the standard classification tree. Finally, the methods are illustrated using data from a study of kidney cancer and a study of surgical mortality after colectomy.

中文翻译:

集群二进制数据的基于残差的树

基于树的方法被广泛用于健康科学研究中的分类,在该研究中,数据通常是聚类的。在本文中,我们提出了标准分类和回归树范例(CART)的一种变体来处理聚类的二进制结果。使用空广义线性混合模型的残差作为响应,我们构建了一个回归树,将协变量空间划分为矩形。这避免了对关联结构进行显式建模,同时仍考虑了与群集相关的设计,从而使我们能够在树的生长,修剪和交叉验证中采用标准的CART机制。根据特定节点内的成功概率,估计最终树中每个终端节点的类预测。我们的方法还可以轻松扩展到树木和随机森林的集合。通过广泛的模拟,我们将基于残差的树与标准分类树进行比较。最后,使用来自肾癌研究和结肠切除术后手术死亡率研究的数据说明了这些方法。
更新日期:2021-02-10
down
wechat
bug