当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Pedigree reconstruction from SNP data: parentage assignment, sibship clustering and beyond.
Molecular Ecology Resources ( IF 5.5 ) Pub Date : 2017-03-09 , DOI: 10.1111/1755-0998.12665
Jisca Huisman 1
Affiliation  

Data on hundreds or thousands of single nucleotide polymorphisms (SNPs) provide detailed information about the relationships between individuals, but currently few tools can turn this information into a multigenerational pedigree. I present the r package sequoia, which assigns parents, clusters half-siblings sharing an unsampled parent and assigns grandparents to half-sibships. Assignments are made after consideration of the likelihoods of all possible first-, second- and third-degree relationships between the focal individuals, as well as the traditional alternative of being unrelated. This careful exploration of the local likelihood surface is implemented in a fast, heuristic hill-climbing algorithm. Distinction between the various categories of second-degree relatives is possible when likelihoods are calculated conditional on at least one parent of each focal individual. Performance was tested on simulated data sets with realistic genotyping error rate and missingness, based on three different large pedigrees (N = 1000-2000). This included a complex pedigree with overlapping generations, occasional close inbreeding and some unknown birth years. Parentage assignment was highly accurate down to about 100 independent SNPs (error rate <0.1%) and fast (<1 min) as most pairs can be excluded from being parent-offspring based on opposite homozygosity. For full pedigree reconstruction, 40% of parents were assumed nongenotyped. Reconstruction resulted in low error rates (<0.3%), high assignment rates (>99%) in limited computation time (typically <1 h) when at least 200 independent SNPs were used. In three empirical data sets, relatedness estimated from the inferred pedigree was strongly correlated to genomic relatedness.

中文翻译:

从SNP数据进行谱系重建:亲子关系分配,同胞关系聚类等。

关于成百上千个单核苷酸多态性(SNP)的数据提供了有关个体之间关系的详细信息,但是目前很少有工具可以将这些信息转变为多代谱系。我介绍了r包sequoia,它分配父母,聚在一起共享未采样父母的同父异母兄弟,以及为半同胞分配祖父母。在考虑关注个人之间所有可能的一级,二级和三级关系的可能性以及不相关的传统替代方法后进行分配。这种对局部似然面的仔细探索是通过一种快速的启发式爬山算法实现的。当以每个焦点个体的至少一个父母为条件计算似然时,可能会在各种二级亲戚之间进行区分。基于三个不同的大型谱系(N = 1000-2000),在具有实际基因分型错误率和缺失的模拟数据集上测试了性能。其中包括复杂的血统,世代重叠,偶有近亲繁殖和一些未知的出生年份。亲本分配非常精确,可精确到100个左右的独立SNP(错误率<0.1%)和快速(<1分钟),因为大多数对都可以基于纯合性排除在亲本后代之外。对于完整的血统重建,假定40%的父母没有基因型。重建导致错误率低(<0.3%),分配率高(> 当至少使用200个独立SNP时,在有限的计算时间内(通常<1小时)达到99%)。在三个经验数据集中,从推断的谱系估计的相关性与基因组相关性高度相关。
更新日期:2019-11-01
down
wechat
bug