当前位置: X-MOL 学术Genet. Epidemiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
locStra: Fast analysis of regional/global stratification in whole‐genome sequencing studies
Genetic Epidemiology ( IF 1.7 ) Pub Date : 2020-09-14 , DOI: 10.1002/gepi.22356
Georg Hahn 1 , Sharon M Lutz 1 , Julian Hecker 2 , Dmitry Prokopenko 3 , Michael H Cho 2 , Edwin K Silverman 2 , Scott T Weiss 2 , Christoph Lange 1 ,
Affiliation  

locStra is an urn:x-wiley:07410395:media:gepi22356:gepi22356-math-0001‐package for the analysis of regional and global population stratification in whole‐genome sequencing (WGS) studies, where regional stratification refers to the substructure defined by the loci in a particular region on the genome. Population substructure can be assessed based on the genetic covariance matrix, the genomic relationship matrix, and the unweighted/weighted genetic Jaccard similarity matrix. Using a sliding window approach, the regional similarity matrices are compared with the global ones, based on user‐defined window sizes and metrics, for example, the correlation between regional and global eigenvectors. An algorithm for the specification of the window size is provided. As the implementation fully exploits sparse matrix algebra and is written in C++, the analysis is highly efficient. Even on single cores, for realistic study sizes (several thousand subjects, several million rare variants per subject), the runtime for the genome‐wide computation of all regional similarity matrices does typically not exceed one hour, enabling an unprecedented investigation of regional stratification across the entire genome. The package is applied to three WGS studies, illustrating the varying patterns of regional substructure across the genome and its beneficial effects on association testing.

中文翻译:

locStra:全基因组测序研究中区域/全球分层的快速分析

locStra是一个urn:x-wiley:07410395:media:gepi22356:gepi22356-math-0001- 全基因组测序 (WGS) 研究中区域和全球人口分层分析包,其中区域分层是指由基因组特定区域中的基因座定义的子结构。种群子结构可以基于遗传协方差矩阵、基因组关系矩阵和未加权/加权遗传 Jaccard 相似矩阵进行评估。使用滑动窗口方法,基于用户定义的窗口大小和度量,例如区域和全局特征向量之间的相关性,将区域相似性矩阵与全局相似性矩阵进行比较。提供了用于指定窗口大小的算法。由于该实现充分利用了稀疏矩阵代数并用 C++ 编写,因此分析效率很高。即使在单核上,对于实际的研究规模(数千名受试者,每个受试者数百万个稀有变异),所有区域相似性矩阵的全基因组计算的运行时间通常不超过一小时,从而能够对整个基因组的区域分层进行前所未有的调查。该软件包应用于三项 WGS 研究,说明了整个基因组区域亚结构的不同模式及其对关联测试的有益影响。
更新日期:2020-09-14
down
wechat
bug