当前位置: X-MOL 学术bioRxiv. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparison between instrumental variable and mediation-based methods for reconstructing causal gene networks in yeast
bioRxiv - Bioinformatics Pub Date : 2020-12-10 , DOI: 10.1101/2020.10.13.337501
Adriaan-Alexander Ludl , Tom Michoel

Causal gene networks model the flow of information within a cell. Reconstructing causal networks from omics data is challenging because correlation does not imply causation. When genomics and transcriptomics data from a segregating population are combined, genomic variants can be used to orient the direction of causality between gene expression traits. Instrumental variable methods use a local expression quantitative trait locus (eQTL) as a randomized instrument for a gene's expression level, and assign target genes based on distal eQTL associations. Mediation-based methods additionally require that distal eQTL associations are mediated by the source gene. A detailed comparison between these methods has not yet been conducted, due to the lack of a standardized implementation of different methods, the limited sample size of most multi-omics datasets, and the absence of ground-truth networks for most organisms. Here we used Findr, a software providing uniform implementations of instrumental variable, mediation, and coexpression-based methods, a recent dataset of 1,012 segregants from a cross between two budding yeast strains, and the YEASTRACT database of known transcriptional interactions to compare causal gene network inference methods. We found that causal inference methods result in a significant overlap with the ground-truth, whereas coexpression did not perform better than random. A subsampling analysis revealed that the performance of mediation saturates at large sample sizes, due to a loss of sensitivity when residual correlations become significant. Instrumental variable methods on the other hand contain false positive predictions, due to genomic linkage between eQTL instruments. Instrumental variable and mediation-based methods also have complementary roles for identifying causal genes underlying transcriptional hotspots. Instrumental variable methods correctly predicted STB5 targets for a hotspot centred on the transcription factor STB5, whereas mediation failed due to Stb5p auto-regulating its own expression. Mediation suggests a new candidate gene, DNM1, for a hotspot on Chr XII, where instrumental variable methods could not distinguish between multiple genes located within the hotspot. In conclusion, causal inference from genomics and transcriptomics data is a powerful approach for reconstructing causal gene networks, which could be further improved by the development of methods to control for residual correlations in mediation analyses and genomic linkage and pleiotropic effects from transcriptional hotspots in instrumental variable analyses.

中文翻译:

仪器变量和基于中介的酵母因果基因网络重构方法的比较

因果基因网络对细胞内信息流进行建模。从组学数据重建因果关系网络具有挑战性,因为相关性并不意味着因果关系。将来自不同种群的基因组学和转录组学数据结合起来时,基因组变体可用于确定基因表达性状之间因果关系的方向。仪器变量方法使用局部表达定量性状基因座(eQTL)作为基因表达水平的随机仪器,并根据远端eQTL关联分配靶基因。基于中介的方法还要求远端eQTL关联由源基因介导。由于缺乏不同方法的标准化实现,因此尚未对这些方法进行详细比较,大多数多组学数据集的样本量有限,并且大多数生物都没有地面真实网络。在这里,我们使用的是Findr,该软件可提供统一的工具变量,中介和基于共表达方法的实现,来自两个发芽酵母菌株之间杂交的1,012个分离物的最新数据集以及已知转录相互作用的YEASTRACT数据库,以比较因果网络推论方法。我们发现,因果推理方法导致与事实真相的显着重叠,而共表达的效果并不比随机的好。二次抽样分析显示,由于残留相关性变得显着时灵敏度降低,因此调解的性能在大样本量时会达到饱和。另一方面,工具变量方法包含错误的肯定预测,由于eQTL仪器之间的基因组联系。基于工具变量和基于中介的方法还具有互补作用,可用于识别转录热点下的因果基因。仪器可变方法正确地预测了以转录因子STB5为中心的热点的STB5靶标,而由于Stb5p自动调节其自身表达,导致调解失败。调解表明,一个新的候选基因DNM1用于Chr XII上的一个热点,其中的仪器可变方法无法区分位于该热点内的多个基因。总之,从基因组学和转录组学数据推论因果关系是重构因果基因网络的有力方法,
更新日期:2020-12-11
down
wechat
bug