当前位置: X-MOL 学术Nat. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Computationally efficient whole-genome regression for quantitative and binary traits
Nature Genetics ( IF 31.7 ) Pub Date : 2021-05-20 , DOI: 10.1038/s41588-021-00870-7
Joelle Mbatchou 1 , Leland Barnard 1 , Joshua Backman 1 , Anthony Marcketta 1 , Jack A Kosmicki 1 , Andrey Ziyatdinov 1 , Christian Benner 1 , Colm O'Dushlaine 1 , Mathew Barber 1 , Boris Boutkov 1 , Lukas Habegger 1 , Manuel Ferreira 1 , Aris Baras 1 , Jeffrey Reid 1 , Goncalo Abecasis 1 , Evan Maxwell 1 , Jonathan Marchini 1
Affiliation  

Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case–control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals.



中文翻译:

针对数量和二元性状的计算高效的全基因组回归

对具有数千个表型的群体进行全基因组关联分析的计算成本很高,特别是在考虑样本相关性或群体结构时。在这里,我们提出了一种名为 REGENIE 的新型机器学习方法,用于拟合定量和二元表型的全基因组回归模型,该模型在多性状分析中比其他方法快得多,同时保持统计效率。该方法自然地适应多个表型的并行分析,并且只需要将基因型矩阵的局部片段加载到内存中,这与必须将全基因组矩阵加载到内存中的现有替代方案相反。这会大大节省计算时间和内存使用量。我们针对不平衡的病例对照表型引入了快速、近似的 Firth 逻辑回归检验。该方法非常适合利用分布式计算框架。我们使用英国生物银行数据集(包含多达 407,746 个人)展示了这种方法的准确性和计算优势。

更新日期:2021-05-20
down
wechat
bug