当前位置: X-MOL 学术Sci. Rep. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Assembly and Analysis of Unmapped Genome Sequence Reads Reveal Novel Sequence and Variation in Dogs.
Scientific Reports ( IF 4.6 ) Pub Date : 2018-Jul-18 , DOI: 10.1038/s41598-018-29190-3
Lindsay A. Holden , Meharji Arumilli , Marjo K. Hytönen , Sruthi Hundi , Jarkko Salojärvi , Kim H. Brown , Hannes Lohi

Dogs are excellent animal models for human disease. They have extensive veterinary histories, pedigrees, and a unique genetic system due to breeding practices. Despite these advantages, one factor limiting their usefulness is the canine genome reference (CGR) which was assembled using a single purebred Boxer. Although a common practice, this results in many high-quality reads remaining unmapped. To address this whole-genome sequence data from three breeds, Border Collie (n = 26), Bearded Collie (n = 7), and Entlebucher Sennenhund (n = 8), were analyzed to identify novel, non-CGR genomic contigs using the previously validated pseudo-de novo assembly pipeline. We identified 256,957 novel contigs and paired-end relationships together with BLAT scores provided 126,555 (49%) high-quality contigs with genomic coordinates containing 4.6 Mb of novel sequence absent from the CGR. These contigs close 12,503 known gaps, including 2.4 Mb containing partially missing sequences for 11.5% of Ensembl, 16.4% of RefSeq and 12.2% of canFam3.1+ CGR annotated genes and 1,748 unmapped contigs containing 2,366 novel gene variants. Examples for six disease-associated genes (SCARF2, RD3, COL9A3, FAM161A, RASGRP1 and DLX6) containing gaps or alternate splice variants missing from the CGR are also presented. These findings from non-reference breeds support the need for improvement of the current Boxer-only CGR to avoid missing important biological information. The inclusion of the missing gene sequences into the CGR will facilitate identification of putative disease mutations across diverse breeds and phenotypes.

中文翻译:

未映射的基因组序列读数的组装和分析揭示了狗的新型序列和变异。

狗是人类疾病的优秀动物模型。由于育种实践,它们具有广泛的兽医史,血统和独特的遗传系统。尽管具有这些优点,但限制其用途的一个因素是犬基因组参考(CGR),该参考基因是使用单个纯种Boxer组装而成的。尽管这是一种常见的做法,但是这导致许多高质量的读取未映射。为了解决这个全基因组序列数据,分析了来自三个品种的边境牧羊犬(n = 26),大胡子牧羊犬(n = 7)和Entlebucher Sennenhund(n = 8),并使用先前已验证的伪de novo装配流水线。我们鉴定了256,957个新重叠群和成对配对关系以及BLAT得分,从而提供了126555(49%)个高质量重叠群,其基因座标包含4个。CGR中没有6 Mb的新序列。这些重叠群关闭了12,503个已知缺口,包括2.4 Mb包含Ensembl的11.5%,RefSeq的16.4%和canFam3.1 + CGR注释基因的12.2%的部分缺失序列,以及1,748个包含2366个新基因变体的未映射重叠群。还提供了六个与疾病相关的基因(SCARF2,RD3,COL9A3,FAM161A,RASGRP1和DLX6)的实例,这些基因含有CGR缺失的缺口或其他剪接变体。非参考品种的这些发现支持需要改进当前仅Boxer的CGR,以避免丢失重要的生物学信息。将缺失的基因序列包含到CGR中将有助于跨各种品种和表型鉴定推定的疾病突变。4 Mb包含Ensembl的11.5%,RefSeq的16.4%和canFam3.1 + CGR注释的基因的12.2%的部分缺失序列以及包含2,366个新基因变体的1,748个未映射的重叠群。还提供了六个与疾病相关的基因(SCARF2,RD3,COL9A3,FAM161A,RASGRP1和DLX6)的实例,这些基因含有CGR缺失的缺口或其他剪接变体。非参考品种的这些发现支持需要改进当前仅Boxer的CGR,以避免丢失重要的生物学信息。将缺失的基因序列包含到CGR中将有助于跨各种品种和表型鉴定推定的疾病突变。4 Mb包含Ensembl的11.5%,RefSeq的16.4%和canFam3.1 + CGR注释的基因的12.2%的部分缺失序列以及包含2,366个新基因变体的1,748个未映射的重叠群。还提供了六个与疾病相关的基因(SCARF2,RD3,COL9A3,FAM161A,RASGRP1和DLX6)的实例,这些基因含有CGR缺失的缺口或其他剪接变体。非参考品种的这些发现支持需要改进当前仅Boxer的CGR,以避免丢失重要的生物学信息。将缺失的基因序列包含到CGR中将有助于跨各种品种和表型鉴定推定的疾病突变。还提供了六个与疾病相关的基因(SCARF2,RD3,COL9A3,FAM161A,RASGRP1和DLX6)的实例,这些基因含有CGR缺失的缺口或其他剪接变体。非参考品种的这些发现支持需要改进当前仅Boxer的CGR,以避免丢失重要的生物学信息。将缺失的基因序列包含到CGR中将有助于跨各种品种和表型鉴定推定的疾病突变。还提供了六个与疾病相关的基因(SCARF2,RD3,COL9A3,FAM161A,RASGRP1和DLX6)的实例,这些基因含有CGR缺失的缺口或其他剪接变体。非参考品种的这些发现支持需要改进当前仅Boxer的CGR,以避免丢失重要的生物学信息。将缺失的基因序列包含到CGR中将有助于跨各种品种和表型鉴定推定的疾病突变。
更新日期:2018-07-19
down
wechat
bug