当前位置: X-MOL 学术Stat. Anal. Data Min. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Subsampling from features in large regression to find “winning features”
Statistical Analysis and Data Mining ( IF 2.1 ) Pub Date : 2021-02-27 , DOI: 10.1002/sam.11499
Yiying Fan 1 , Jiayang Sun 2
Affiliation  

Feature (or variable) selection from a large number of p features continuously challenges data science, especially for ever‐enlarging data and in discovering scientifically important features in a regression setting. For example, to develop valid drug targets for ovarian cancer, we must control the false‐discovery rate (FDR) of a selection procedure. The popular approach to feature selection in large‐p regression uses a penalized likelihood or a shrinkage estimation, such as a LASSO, SCAD, Elastic Net, or MCP procedure. We present a different approach called the Subsampling Winner algorithm (SWA), which subsamples from p features. The idea of SWA is analogous to selecting US national merit scholars' that selects semifinalists based on student's performance in tests done at local schools (a.k.a. subsample analyses), and then determine the finalists (a.k.a. winning features) from the semifinalists. Due to its subsampling nature, SWA can scale to data of any dimension. SWA also has the best‐controlled FDR compared to the penalized and Random Forest procedures while having a competitive true‐feature discovery rate. Our application of SWA to an ovarian cancer data revealed functionally important genes and pathways.

中文翻译:

从大型回归特征中进行子采样以找到“获胜特征”

从大量p个特征中进行特征(或变量)选择一直在挑战数据科学,特别是对于不断扩大的数据以及在回归设置中发现科学上重要的特征而言。例如,要制定有效的卵巢癌药物靶标,我们必须控制选择程序的错误发现率(FDR)。流行的方法特征选择在large- p回归使用一个惩罚可能性或收缩的估计,诸如LASSO,SCAD,弹性网,或MCP过程。我们提出了另一种方法,称为子采样获胜者算法(SWA),该方法从p进行子采样特征。SWA的想法类似于选择美国国家优胜者学者,后者根据学生在当地学校进行的测试中的表现(即子样本分析)选择准决赛者,然后从准决赛者中确定决赛选手(也称为获胜特征)。由于其二次采样的性质,SWA可以扩展到任何维度的数据。与惩罚性和随机森林程序相比,SWA还具有最佳控制的FDR,同时具有真实特征发现率的竞争优势。我们在卵巢癌数据中的SWA应用揭示了功能上重要的基因和途径。
更新日期:2021-03-15
down
wechat
bug