当前位置: X-MOL 学术Stat. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SuRF: A new method for sparse variable selection, with application in microbiome data analysis
Statistics in Medicine ( IF 2 ) Pub Date : 2020-11-20 , DOI: 10.1002/sim.8809
Lihui Liu 1 , Hong Gu 1 , Johan Van Limbergen 2 , Toby Kenney 1
Affiliation  

In this article, we present a new variable selection method for regression and classification purposes, particularly for microbiome analysis. Our method, called subsampling ranking forward selection (SuRF), is based on LASSO penalized regression, subsampling and forward‐selection methods. SuRF offers major advantages over existing variable selection methods in terms of both sparsity of selected models and model inference. We provide an R package that can implement our method for generalized linear models. We apply our method to classification problems from microbiome data, using a novel agglomeration approach to deal with the special tree‐like correlation structure of the variables. Existing methods arbitrarily choose a taxonomic level a priori before performing the analysis, whereas by combining SuRF with these aggregated variables, we are able to identify the key biomarkers at the appropriate taxonomic level, as suggested by the data. We present simulations in multiple sparse settings to demonstrate that our approach performs better than several other popularly used existing approaches in recovering the true variables. We apply SuRF to two microbiome datasets: one about prediction of pouchitis and another for identifying samples from two healthy individuals. We find that SuRF can provide a better or comparable prediction with other methods while controlling the false positive rate of variable selection.

中文翻译:

SuRF:一种稀疏变量选择的新方法,在微生物组数据分析中的应用

在本文中,我们提出了一种用于回归和分类目的的新变量选择方法,特别是用于微生物组分析。我们的方法称为二次抽样排名正向选择(SuRF),它基于LASSO惩罚回归,二次抽样和正向选择方法。就所选模型的稀疏性和模型推断而言,SuRF与现有变量选择方法相比具有主要优势。我们提供一个R包,该包可以实现我们针对广义线性模型的方法。我们使用一种新的集聚方法来处理变量的特殊树状相关结构,从而将我们的方法应用于来自微生物组数据的分类问题。现有的方法在进行分析之前会先验地选择分类标准,而通过将SuRF与这些汇总变量结合起来,我们能够根据数据建议,在适当的分类标准下识别关键的生物标记。我们提供了多种稀疏设置下的仿真,以证明我们的方法在恢复真实变量方面比其他几种普遍使用的现有方法表现更好。我们将SuRF应用于两个微生物组数据集:一个关于预测袋囊炎,另一个用于识别来自两个健康个体的样本。
更新日期:2021-01-13
down
wechat
bug