当前位置: X-MOL 学术bioRxiv. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
locStra: Fast analysis of regional/global stratification in whole genome sequencing (WGS) studies
bioRxiv - Genetics Pub Date : 2020-08-06 , DOI: 10.1101/2020.03.06.981050
Georg Hahn , Sharon M. Lutz , Julian Hecker , Dmitry Prokopenko , Michael H. Cho , Edwin K. Silverman , Scott T. Weiss , Christoph Lange ,

locStra is an R-package for the analysis of regional and global population stratification in whole genome sequencing studies, where regional stratification refers to the substructure defined by the loci in a particular region on the genome. Population substructure can be assessed based on the genetic covariance matrix, the genomic relationship matrix, and the unweighted/weighted genetic Jaccard similarity matrix. Using a sliding window approach, the regional similarity matrices are compared to the global ones, based on user-defined window sizes and metrics, e.g. the correlation between regional and global eigenvectors. An algorithm for the specification of the window size is provided. As the implementation fully exploits sparse matrix algebra and is written in C++, the analysis is highly efficient. Even on single cores, for realistic study sizes (several thousand subjects, several million RVs per subject), the runtime for the genome-wide computation of all regional similarity matrices does typically not exceed one hour, enabling an unprecedented investigation of regional stratification across the entire genome. The package is applied to three WGS studies, illustrating the varying patterns of regional substructure across the genome and its beneficial effects on association testing.

中文翻译:

locStra:在全基因组测序(WGS)研究中快速分析区域/全局分层

locStra是一个R-package,用于分析全基因组测序研究中的区域和全球种群分层,其中区域分层是指基因组上特定区域中的基因座所定义的亚结构。可以基于遗传协方差矩阵,基因组关系矩阵和未加权/加权遗传雅克卡德相似性矩阵来评估种群子结构。使用滑动窗口方法,根据用户定义的窗口大小和指标(例如,区域特征向量与全局特征向量之间的相关性),将区域相似性矩阵与全局相似性矩阵进行比较。提供了一种用于指定窗口大小的算法。由于该实现完全利用稀疏矩阵代数并用C ++编写,因此分析非常高效。即使在单核上 对于现实的研究规模(数千个受试者,每个受试者几百万个RV),所有区域相似性矩阵在全基因组范围内的计算时间通常不会超过一小时,从而可以对整个基因组的区域分层进行前所未有的调查。该软件包应用于三项WGS研究,阐明了整个基因组区域亚结构的不同模式及其对关联测试的有益影响。
更新日期:2020-08-06
down
wechat
bug