当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Inherent Nonlinear Distribution of High-Dimensional Genotypic Data Identified as a Possible Source of Confounding Factors in Population Structure Analysis
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 4.5 ) Pub Date : 2021-03-30 , DOI: 10.1109/tcbb.2021.3069503
Meng Wang 1
Affiliation  

It has become routing work to detect and correct for population structure in genome-wide association analysis. A variety of methods have been proposed. Particularly, the methods based on spectral graph theory have shown superior performance. We discovered that the inherent nonlinear distribution of high-dimensional genotypic data was a possible source of confounding factors in population structure analysis, and was also the possible underlying reason that accounted for the superiority of these spectral-based methods. We verified this hypothesis by validating a variation of the Laplacian Eigen analysis: LAPMAP. The method could faithfully reveal the underlying population structures of HapMap II and III data sets. The inferred top eigenvectors together with minor eigenvectors were used to segregate samples by their ancestries. We found that the top 3 eigenvectors differentiated the 4 populations in phase II data set; the top 3 eigenvectors clustered the populations into 4 clusters, reflecting their continental origins. In phase III populations, 9 populations were well recognized. Next, we estimated admixture proportions for simulated individuals. The method showed comparable or better performance in capturing and correcting for modelled population structures. All experimental results showed that LAPMAP was robust, efficient and scalable to genome-wide association studies.

中文翻译:

高维基因型数据的固有非线性分布被确定为种群结构分析中混杂因素的可能来源

在全基因组关联分析中检测和纠正种群结构已成为路由工作。已经提出了多种方法。特别是基于谱图理论的方法表现出优越的性能。我们发现高维基因型数据固有的非线性分布是种群结构分析中混杂因素的可能来源,也是解释这些基于光谱的方法优越性的可能根本原因。我们通过验证拉普拉斯特征分析的变体来验证这一假设:LAPMAP。该方法可以忠实地揭示 HapMap II 和 III 数据集的基本人口结构。推断的顶部特征向量与次要特征向量一起用于按祖先分离样本。我们发现前 3 个特征向量区分了 II 期数据集中的 4 个群体;前 3 个特征向量将人口聚集成 4 个集群,反映了他们的大陆起源。在 III 期人群中,有 9 个人群得到了很好的认可。接下来,我们估计了模拟个体的混合比例。该方法在捕获和校正建模人口结构方面表现出相当或更好的性能。所有实验结果表明,LAPMAP 稳健、高效且可扩展至全基因组关联研究。该方法在捕获和校正建模人口结构方面表现出相当或更好的性能。所有实验结果表明,LAPMAP 稳健、高效且可扩展至全基因组关联研究。该方法在捕获和校正建模人口结构方面表现出相当或更好的性能。所有实验结果表明,LAPMAP 稳健、高效且可扩展至全基因组关联研究。
更新日期:2021-03-30
down
wechat
bug