PLOS Genetics ( IF 4.5 ) Pub Date : 2020-06-15 , DOI: 10.1371/journal.pgen.1008855 Wei Cheng 1, 2 , Sohini Ramachandran 1, 2 , Lorin Crawford 2, 3, 4
Traditional univariate genome-wide association studies generate false positives and negatives due to difficulties distinguishing associated variants from variants with spurious nonzero effects that do not directly influence the trait. Recent efforts have been directed at identifying genes or signaling pathways enriched for mutations in quantitative traits or case-control studies, but these can be computationally costly and hampered by strict model assumptions. Here, we present gene-ε, a new approach for identifying statistical associations between sets of variants and quantitative traits. Our key insight is that enrichment studies on the gene-level are improved when we reformulate the genome-wide SNP-level null hypothesis to identify spurious small-to-intermediate SNP effects and classify them as non-causal. gene-ε efficiently identifies enriched genes under a variety of simulated genetic architectures, achieving greater than a 90% true positive rate at 1% false positive rate for polygenic traits. Lastly, we apply gene-ε to summary statistics derived from six quantitative traits using European-ancestry individuals in the UK Biobank, and identify enriched genes that are in biologically relevant pathways.
中文翻译:
非空 SNP 效应大小分布的估计能够检测复杂性状背后的富集基因。
传统的单变量全基因组关联研究会产生假阳性和假阴性,因为难以将相关变异与具有不直接影响性状的虚假非零效应的变异区分开来。最近的努力旨在确定数量性状或病例对照研究中富含突变的基因或信号通路,但这些可能计算成本高,并受到严格模型假设的阻碍。在这里,我们提出基因-ε,一种用于识别变体集和数量性状之间的统计关联的新方法。我们的主要见解是,当我们重新制定全基因组 SNP 水平零假设以识别虚假的中小 SNP 效应并将其归类为非因果关系时,基因水平的富集研究得到了改进。基因-ε在各种模拟遗传结构下有效地识别富集基因,对于多基因性状,在1%的假阳性率下实现了超过90%的真阳性率。最后,我们使用英国生物银行中的欧洲血统个体将基因-ε应用于从六个数量性状得出的汇总统计数据,并确定生物相关途径中的富集基因。