当前位置: X-MOL 学术Stat. Anal. Data Min. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Weighted Random Forests Approach to Improve Predictive Performance.
Statistical Analysis and Data Mining ( IF 1.3 ) Pub Date : 2013-07-08 , DOI: 10.1002/sam.11196
Stacey J Winham 1 , Robert R Freimuth 1 , Joanna M Biernacka 2
Affiliation  

Identifying genetic variants associated with complex disease in high‐dimensional data is a challenging problem, and complicated etiologies such as gene–gene interactions are often ignored in analyses. The data‐mining method random forests (RF) can handle high dimensions; however, in high‐dimensional data, RF is not an effective filter for identifying risk factors associated with the disease trait via complex genetic models such as gene–gene interactions without strong marginal components. In this article we propose an extension called weighted random forests (wRF), which incorporates tree‐level weights to emphasize more accurate trees in prediction and calculation of variable importance. We demonstrate through simulation and application to data from a genetic study of addiction that wRF can outperform RF in high‐dimensional data, although the improvements are modest and limited to situations with effect sizes that are larger than what is realistic in genetics of complex disease. Thus, the current implementation of wRF is unlikely to improve detection of relevant predictors in high‐dimensional genetic data, but may be applicable in other situations where larger effect sizes are anticipated. © 2013 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2013

中文翻译:

一种提高预测性能的加权随机森林方法。

在高维数据中识别与复杂疾病相关的遗传变异是一个具有挑战性的问题,并且在分析中经常忽略复杂的病因,例如基因 - 基因相互作用。数据挖掘方法随机森林(RF)可以处理高维;然而,在高维数据中,RF 不是通过复杂的遗传模型(例如没有强边缘成分的基因 - 基因相互作用)识别与疾病特征相关的风险因素的有效过滤器。在本文中,我们提出了一个称为加权随机森林 (wRF) 的扩展,它结合了树级权重,以强调在预测和计算变量重要性时更准确的树。我们通过对成瘾遗传研究数据的模拟和应用证明,wRF 在高维数据中的表现优于 RF,尽管这些改进是适度的,并且仅限于效应量大于复杂疾病遗传学中实际情况的情况。因此,当前 wRF 的实施不太可能改善高维遗传数据中相关预测因子的检测,但可能适用于预期更大效应量的其他情况。© 2013 Wiley Periodicals, Inc. 统计分析和数据挖掘,2013
更新日期:2013-07-08
down
wechat
bug