当前位置: X-MOL 学术Genet. Epidemiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Some statistical consideration in transcriptome-wide association studies.
Genetic Epidemiology ( IF 1.7 ) Pub Date : 2019-12-10 , DOI: 10.1002/gepi.22274
Haoran Xue 1 , Wei Pan 2 ,
Affiliation  

The methodology of transcriptome‐wide association studies (TWAS) has become popular in integrating a reference expression quantitative trait (eQTL) data set with an independent main GWAS data set to identify (putatively) causal genes, shedding mechanistic insights to biological pathways from genetic variants to a GWAS trait mediated by gene expression. Statistically TWAS is a (two‐sample) 2‐stage least squares (2SLS) method in the framework of instrumental variables analysis for causal inference: in Stage 1 it uses the reference eQTL data to impute a genes expression for the main GWAS data, then in Stage 2 it tests for association between the imputed gene expression and the GWAS trait; if an association is detected in Stage 2, a (putatively) causal relationship between the gene and the GWAS trait is claimed. If a nonlinear model or a generalized linear model (GLM) is fitted in Stage 2 (e.g., for a binary GWAS trait), it is known that using only imputed gene expression, as in standard TWAS, in general does not lead to a consistent (i.e., asymptotically unbiased) estimate for the causal effect; accordingly, a variation of 2SLS, called two‐stage residual inclusion (2SRI), has been proposed to yield better estimates (e.g., being consistent under suitable conditions). Our main goal is to investigate whether it is necessary or even better to apply 2SRI, instead of the standard 2SLS. In addition, due to the use of imputed gene expression (i.e., with measurement errors), it is known that in general some correction to the standard error estimate of the causal effect estimate has to be applied, while in the standard TWAS no correction is applied. Is this an issue? We also compare one‐sample 2SLS with two‐sample 2SLS (i.e., the standard TWAS). We used the Alzheimer's Disease Neuroimaging Initiative (ADNI) data and simulated data mimicking the ADNI data to address the above questions. At the end, we conclude that, in practice with the large sample sizes and small effect sizes of genetic variants, the standard TWAS performs well and is recommended.

中文翻译:


全转录组关联研究中的一些统计考虑。



全转录组关联研究 (TWAS) 的方法已变得流行,它将参考表达数量性状 (eQTL) 数据集与独立的主 GWAS 数据集相结合,以识别(假定的)因果基因,从遗传变异中揭示生物途径的机制由基因表达介导的 GWAS 特征。从统计学上讲,TWAS 是因果推理工具变量分析框架中的(双样本)2 阶段最小二乘 (2SLS) 方法:在第 1 阶段,它使用参考 eQTL 数据来估算主要 GWAS 数据的基因表达,然后在第二阶段,它测试推算基因表达与 GWAS 性状之间的关联;如果在第 2 阶段检测到关联,则表明该基因与 GWAS 性状之间存在(推定的)因果关系。如果在第 2 阶段拟合非线性模型或广义线性模型 (GLM)(例如,对于二元 GWAS 性状),则已知仅使用估算的基因表达(如标准 TWAS 中那样)通常不会导致一致的结果(即渐近无偏)因果效应估计;因此,有人提出了 2SLS 的一种变体,称为两阶段残差包含 (2SRI),以产生更好的估计(例如,在适当的条件下保持一致)。我们的主要目标是研究是否有必要或者更好地应用 2SRI,而不是标准的 2SLS。此外,由于使用估算的基因表达(即带有测量误差),众所周知,通常必须对因果效应估计的标准误差估计进行一些校正,而在标准 TWAS 中则不进行校正。应用。这是一个问题吗?我们还将单样本 2SLS 与双样本 2SLS(即标准 TWAS)进行比较。 我们使用阿尔茨海默病神经影像倡议(ADNI)数据和模仿 ADNI 数据的模拟数据来解决上述问题。最后,我们得出的结论是,在实践中,样本量大、遗传变异效应小,标准 TWAS 表现良好,值得推荐。
更新日期:2019-12-10
down
wechat
bug