Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts.,Nature Genetics

当前位置： X-MOL 学术 › Nat. Genet. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts.
Nature Genetics ( IF 31.7 ) Pub Date : 2020-05-18 , DOI: 10.1038/s41588-020-0621-6
Wei Zhou _{1,

2,

3,

4} , Zhangchen Zhao _{1,

5} , Jonas B Nielsen ₆ , Lars G Fritsche _{1,

5} , Jonathon LeFaive _{1,

5} , Sarah A Gagliano Taliun _{1,

5} , Wenjian Bi _{1,

5} , Maiken E Gabrielsen ₇ , Mark J Daly _{2,

3,

4,

8} , Benjamin M Neale _{2,

3,

4} , Kristian Hveem _{7,

9} , Goncalo R Abecasis _{1,

5} , Cristen J Willer _{6,

10,

11} , Seunggeun Lee _{1,

5,

12}

Affiliation

Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA.
Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA.
Division of Cardiology, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI, USA.
K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Norwegian University of Science and Technology (NTNU), Trondheim, Norway.
Institute for Molecular Medicine Finland, Helsinki Institute of Life Sciences, University of Helsinki, Helsinki, Finland.
HUNT Research Centre, Department of Public Health and Nursing, Norwegian University of Science and Technology (NTNU), Levanger, Norway.
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, USA.
Graduate School of Data Science, Seoul National University, Seoul, Republic of Korea.

With very large sample sizes, biobanks provide an exciting opportunity to identify genetic components of complex traits. To analyze rare variants, region-based multiple-variant aggregate tests are commonly used to increase power for association tests. However, because of the substantial computational cost, existing region-based tests cannot analyze hundreds of thousands of samples while accounting for confounders such as population stratification and sample relatedness. Here we propose a scalable generalized mixed-model region-based association test, SAIGE-GENE, that is applicable to exome-wide and genome-wide region-based analysis for hundreds of thousands of samples and can account for unbalanced case-control ratios for binary traits. Through extensive simulation studies and analysis of the HUNT study with 69,716 Norwegian samples and the UK Biobank data with 408,910 White British samples, we show that SAIGE-GENE can efficiently analyze large-sample data (N > 400,000) with type I error rates well controlled.

中文翻译：

用于大型生物库和队列中基于区域的关联测试的可扩展广义线性混合模型。

生物库的样本量非常大，为识别复杂性状的遗传成分提供了令人兴奋的机会。为了分析罕见变异，通常使用基于区域的多变异聚合测试来提高关联测试的功效。然而，由于大量的计算成本，现有的基于区域的测试无法分析数十万个样本，同时考虑人口分层和样本相关性等混杂因素。在这里，我们提出了一种可扩展的广义混合模型基于区域的关联测试，SAIGE-GENE，适用于数十万个样本的外显子组范围和基因组范围的基于区域的分析，并且可以解释不平衡的病例对照比二元特征。通过对包含 69,716 个挪威样本的 HUNT 研究和包含 408,910 个英国白人样本的 UK Biobank 数据进行广泛的模拟研究和分析，我们表明 SAIGE-GENE 可以有效地分析大样本数据（N > 400,000），并且 I 类错误率得到很好的控制。

更新日期：2020-05-18

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11