当前位置: X-MOL 学术Hum. Hered. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Unbalanced Sample Size Introduces Spurious Correlations to Genome-Wide Heterozygosity Analyses.
Human Heredity ( IF 1.8 ) Pub Date : 2020-06-15 , DOI: 10.1159/000507576
Li Liu 1, 2, 3 , Richard J Caselli 4
Affiliation  

Excess of heterozygosity (H) is a widely used measure of genetic diversity of a population. As high-throughput sequencing and genotyping data become readily available, it has been applied to investigating the associations of genome-wide genetic diversity with human diseases and traits. However, these studies often report contradictory results. In this paper, we present a meta-analysis of five whole-exome studies to examine the association of H scores with Alzheimer’s disease. We show that the mean H score of a group is not associated with the disease status, but ot is associated with the sample size. Across all five studies, the group with more samples has a significantly lower H score than the group with fewer samples. To remove potential confounders in empirical data sets, we perform computer simulations to create artificial genomes controlled for the number of polymorphic loci, the sample size, and the allele frequency. Analyses of these simulated data confirm the negative correlation between the sample size and the H score. Furthermore, we find that genomes with a large number of rare variants also have inflated H scores. These biases altogether can lead to spurious associations between genetic diversity and the phenotype of interest. Based on these findings, we advocate that studies shall balance the sample sizes when using genome-wide H scores to assess genetic diversities of different populations, which helps improve the reproducibility of future research.
Hum Hered


中文翻译:

不平衡的样本量将杂散相关性引入了基因组范围的杂合性分析。

杂合度过高(H)是一种广泛用于衡量群体遗传多样性的指标。随着高通量测序和基因分型数据的获得,它已被用于调查全基因组遗传多样性与人类疾病和性状的关联。但是,这些研究经常报告矛盾的结果。在本文中,我们对五项全外显子研究进行了荟萃分析,以检验H分数与阿尔茨海默氏病的相关性。我们表明,一组的平均H分数与疾病状况无关,而ot与样本量有关。在所有五项研究中,样本量较高的组的H得分显着低于样本量较少的组。为了消除经验数据集中的潜在混杂因素,我们执行计算机模拟以创建人工基因组,该基因组受多态位点数量,样本大小和等位基因频率的控制。对这些模拟数据的分析证实了样本量与H得分之间的负相关性。此外,我们发现具有大量罕见变体的基因组也具有膨胀的H分数。这些偏见完全可以导致遗传多样性与目标表型之间的虚假关联。基于这些发现,我们主张在使用全基因组H分数评估不同人群的遗传多样性时,研究应平衡样本量,这有助于提高未来研究的可重复性。对这些模拟数据的分析证实了样本量与H得分之间的负相关性。此外,我们发现具有大量罕见变体的基因组也具有膨胀的H分数。这些偏见完全可以导致遗传多样性与目标表型之间的虚假关联。基于这些发现,我们主张在使用全基因组H分数评估不同人群的遗传多样性时,研究应平衡样本量,这有助于提高未来研究的可重复性。对这些模拟数据的分析证实了样本量与H得分之间的负相关性。此外,我们发现具有大量罕见变体的基因组也具有膨胀的H分数。这些偏见完全可以导致遗传多样性与目标表型之间的虚假关联。基于这些发现,我们主张在使用全基因组H分数评估不同人群的遗传多样性时,研究应平衡样本量,这有助于提高未来研究的可重复性。
嗡嗡声
更新日期:2020-06-15
down
wechat
bug