当前位置: X-MOL 学术Genet. Epidemiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Some statistical consideration in transcriptome-wide association studies.
Genetic Epidemiology ( IF 1.7 ) Pub Date : 2019-12-10 , DOI: 10.1002/gepi.22274
Haoran Xue 1 , Wei Pan 2 ,
Affiliation  

The methodology of transcriptome‐wide association studies (TWAS) has become popular in integrating a reference expression quantitative trait (eQTL) data set with an independent main GWAS data set to identify (putatively) causal genes, shedding mechanistic insights to biological pathways from genetic variants to a GWAS trait mediated by gene expression. Statistically TWAS is a (two‐sample) 2‐stage least squares (2SLS) method in the framework of instrumental variables analysis for causal inference: in Stage 1 it uses the reference eQTL data to impute a genes expression for the main GWAS data, then in Stage 2 it tests for association between the imputed gene expression and the GWAS trait; if an association is detected in Stage 2, a (putatively) causal relationship between the gene and the GWAS trait is claimed. If a nonlinear model or a generalized linear model (GLM) is fitted in Stage 2 (e.g., for a binary GWAS trait), it is known that using only imputed gene expression, as in standard TWAS, in general does not lead to a consistent (i.e., asymptotically unbiased) estimate for the causal effect; accordingly, a variation of 2SLS, called two‐stage residual inclusion (2SRI), has been proposed to yield better estimates (e.g., being consistent under suitable conditions). Our main goal is to investigate whether it is necessary or even better to apply 2SRI, instead of the standard 2SLS. In addition, due to the use of imputed gene expression (i.e., with measurement errors), it is known that in general some correction to the standard error estimate of the causal effect estimate has to be applied, while in the standard TWAS no correction is applied. Is this an issue? We also compare one‐sample 2SLS with two‐sample 2SLS (i.e., the standard TWAS). We used the Alzheimer's Disease Neuroimaging Initiative (ADNI) data and simulated data mimicking the ADNI data to address the above questions. At the end, we conclude that, in practice with the large sample sizes and small effect sizes of genetic variants, the standard TWAS performs well and is recommended.

中文翻译:

整个转录组关联研究中的一些统计考虑。

转录组范围关联研究(TWAS)的方法在将参考表达定量性状(eQTL)数据集与独立的主要GWAS数据集相结合以识别(可能是)因果基因,从机制上洞察遗传变异的生物学途径方面变得很流行基因表达介导的GWAS性状。从统计上讲,TWAS是在工具变量分析框架中进行因果推断的(两样本)2阶段最小二乘(2SLS)方法:在阶段1中,它使用参考eQTL数据为主要GWAS数据推算基因表达,然后在阶段2中,测试推定的基因表达与GWAS性状之间的关联;如果在阶段2中检测到关联,则声称该基因与GWAS性状之间存在因果关系。如果在阶段2中拟合了非线性模型或广义线性模型(GLM)(例如,对于二元GWAS性状),则众所周知,仅使用估算的基因表达(如标准TWAS)通常不会导致一致性(即,渐近无偏)因果效应估计;因此,有人提出了2SLS的变体,称为两阶段残差包含(2SRI),以产生更好的估计(例如,在适当条件下保持一致)。我们的主要目标是研究应用2SRI代替标准2SLS是必要还是更好。此外,由于使用推定的基因表达(即存在测量误差),因此众所周知,通常必须对因果效应估计值的标准误差估计值进行某些校正,而在标准TWAS中,无需进行校正应用。这是问题吗?我们还将一个样本2SLS与两个样本2SLS(即标准TWAS)进行比较。我们使用了阿尔茨海默氏病神经影像学倡议(ADNI)数据和模仿ADNI数据的模拟数据来解决上述问题。最后,我们得出结论,在实践中,对于较大的样本量和较小的遗传变异效应量,标准TWAS表现良好,值得推荐。
更新日期:2019-12-10
down
wechat
bug