当前位置: X-MOL 学术Ann. Appl. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Feature selection for data integration with mixed multiview data
Annals of Applied Statistics ( IF 1.3 ) Pub Date : 2020-12-19 , DOI: 10.1214/20-aoas1389
Yulia Baker , Tiffany M. Tang , Genevera I. Allen

Data integration methods that analyze multiple sources of data simultaneously can often provide more holistic insights than can separate inquiries of each data source. Motivated by the advantages of data integration in the era of “big data,” we investigate feature selection for high-dimensional multiview data with mixed data types (e.g., continuous, binary, count-valued). This heterogeneity of multiview data poses numerous challenges for existing feature selection methods. However, after critically examining these issues through empirical and theoretically-guided lenses, we develop a practical solution, the Block Randomized Adaptive Iterative Lasso (B-RAIL) which combines the strengths of the randomized Lasso, adaptive weighting schemes and stability selection. B-RAIL serves as a versatile data integration method for sparse regression and graph selection, and we demonstrate the effectiveness of B-RAIL through extensive simulations and a case study to infer the ovarian cancer gene regulatory network. In this case study, B-RAIL successfully identifies well-known biomarkers associated with ovarian cancer and hints at novel candidates for future ovarian cancer research.

中文翻译:

与混合多视图数据进行数据集成的功能选择

与单独查询每个数据源相比,同时分析多个数据源的数据集成方法通常可以提供更全面的见解。受“大数据”时代数据集成优势的启发,我们研究了具有混合数据类型(例如,连续,二进制,计数值)的高维多视图数据的特征选择。多视图数据的这种异构性给现有的特征选择方法带来了许多挑战。但是,在通过经验和理论指导对这些问题进行严格审查之后,我们开发了一种实用的解决方案,即块随机自适应迭代套索(B-RAIL),它结合了随机套索的优势,自适应加权方案和稳定性选择。B-RAIL是稀疏回归和图形选择的通用数据集成方法,我们通过广泛的模拟和案例研究证明B-RAIL的有效性,以推断卵巢癌基因调控网络。在本案例研究中,B-RAIL成功地鉴定了与卵巢癌相关的著名生物标志物,并暗示了未来卵巢癌研究的新候选人。
更新日期:2020-12-20
down
wechat
bug