当前位置: X-MOL 学术Bioinformatics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SAIGEgds - an efficient statistical tool for large-scale PheWAS with mixed models.
Bioinformatics ( IF 4.4 ) Pub Date : 2020-09-08 , DOI: 10.1093/bioinformatics/btaa731
Xiuwen Zheng 1 , J Wade Davis 1
Affiliation  

Phenome-wide association studies (PheWASs) are known to be a powerful tool in discovery and replication of genetic association studies. To reduce the computational burden of PheWAS in the large cohorts such as the UK Biobank, the SAIGE method has been proposed to control for case-control imbalance and sample relatedness in a tractable manner. However, SAIGE is still computationally intensive when deployed in analyzing the associations of thousands of ICD10-coded phenotypes with whole-genome imputed genotype data. Here we present a new high-performance statistical R package (SAIGEgds) for large-scale PheWAS using generalized linear mixed models. The package implements the SAIGE method in optimized C ++ codes, taking advantage of sparse genotype dosages and integrating the efficient genomic data structure (GDS) file format. Benchmarks using the UK Biobank White British genotype data (N ≈ 430K) with coronary heart disease and simulated cases show that the implementation in SAIGEgds is 5 to 6 times faster than the SAIGE R package. When used in conjunction with high-performance computing clusters, SAIGEgds provides an efficient analysis pipeline for biobank-scale PheWAS.

中文翻译:

SAIGEgds-用于带有混合模型的大型PheWAS的有效统计工具。

整个现象的关联研究(PheWAS)是发现和复制遗传关联研究的有力工具。为了减轻大型队列(例如英国生物银行)中PheWAS的计算负担,已提出了SAIGE方法以易于处理的方式控制病例控制不平衡和样品相关性。但是,当部署SAIGE来分析成千上万个ICD10编码表型与全基因组估算基因型数据的关联时,仍然需要大量计算。在这里,我们为使用广义线性混合模型的大型PheWAS提供了一种新的高性能统计R包(SAIGEgds)。该软件包以优化的C ++代码实现SAIGE方法,利用稀疏的基因型剂量并集成了有效的基因组数据结构(GDS)文件格式。使用UK Biobank White British基因型数据(N≈430K)与冠心病和模拟病例进行的基准测试表明,在SAIGEgds中实施的速度比SAIGE R软件包快5至6倍。与高性能计算集群结合使用时,SAIGEgds可为生物库规模的PheWAS提供高效的分析流程。
更新日期:2020-09-08
down
wechat
bug