当前位置: X-MOL 学术Stat. Appl. Genet. Molecul. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
AdaReg: data adaptive robust estimation in linear regression with application in GTEx gene expressions
Statistical Applications in Genetics and Molecular Biology ( IF 0.9 ) Pub Date : 2021-07-12 , DOI: 10.1515/sagmb-2020-0042
Meng Wang 1 , Lihua Jiang 1 , Michael P Snyder 1
Affiliation  

The Genotype-Tissue Expression (GTEx) project provides a valuable resource of large-scale gene expressions across multiple tissue types. Under various technical noise and unknown or unmeasured factors, how to robustly estimate the major tissue effect becomes challenging. Moreover, different genes exhibit heterogeneous expressions across different tissue types. Therefore, we need a robust method which adapts to the heterogeneities of gene expressions to improve the estimation for the tissue effect. We followed the approach of the robust estimation based on γ-density-power-weight in the works of Fujisawa, H. and Eguchi, S. (2008). Robust parameter estimation with a small bias against heavy contamination. J. Multivariate Anal. 99: 2053–2081 and Windham, M.P. (1995). Robustifying model fitting. J. Roy. Stat. Soc. B: 599–609, where γ is the exponent of density weight which controls the balance between bias and variance. As far as we know, our work is the first to propose a procedure to tune the parameter γ to balance the bias-variance trade-off under the mixture models. We constructed a robust likelihood criterion based on weighted densities in the mixture model of Gaussian population distribution mixed with unknown outlier distribution, and developed a data-adaptive γ-selection procedure embedded into the robust estimation. We provided a heuristic analysis on the selection criterion and found that our practical selection trend under various γ’s in average performance has similar capability to capture minimizer γ as the inestimable mean squared error (MSE) trend from our simulation studies under a series of settings. Our data-adaptive robustifying procedure in the linear regression problem (AdaReg) showed a significant advantage in both simulation studies and real data application in estimating tissue effect of heart samples from the GTEx project, compared to the fixed γ procedure and other robust methods. At the end, the paper discussed some limitations on this method and future work.

中文翻译:

AdaReg:线性回归中的数据自适应稳健估计与 GTEx 基因表达中的应用

基因型-组织表达 (GTEx) 项目提供了跨多种组织类型的大规模基因表达的宝贵资源。在各种技术噪声和未知或未测量的因素下,如何稳健地估计主要组织效应变得具有挑战性。此外,不同的基因在不同的组织类型中表现出异质的表达。因此,我们需要一种适应基因表达异质性的稳健方法来改进对组织效应的估计。我们遵循基于稳健估计的方法γ-Fujisawa, H. 和 Eguchi, S. (2008) 作品中的密度-功率-权重。稳健的参数估计,对重污染具有小偏差。J. 多元肛门. 99:2053-2081 和温德姆,MP (1995)。强化模型拟合。J.罗伊。统计。社会党。乙: 599–609, 其中γ是密度权重的指数,它控制偏差和方差之间的平衡。据我们所知,我们的工作是第一个提出调整参数的程序γ平衡混合模型下的偏差-方差权衡。我们在高斯种群分布与未知异常值分布混合的混合模型中构建了一个基于加权密度的稳健似然准则,并开发了一种数据自适应γ- 嵌入到稳健估计中的选择程序。我们对选择标准进行了启发式分析,发现我们在各种情况下的实际选择趋势γ的平均性能具有类似的捕获最小化器的能力γ作为我们在一系列设置下的模拟研究中不可估量的均方误差 (MSE) 趋势。我们在线性回归问题 (AdaReg) 中的数据自适应鲁棒化程序在模拟研究和真实数据应用中显示出显着优势,在估计来自 GTEx 项目的心脏样本的组织效应方面,与固定的γ程序和其他稳健的方法。最后,本文讨论了这种方法的一些局限性和未来的工作。
更新日期:2021-07-12
down
wechat
bug