当前位置: X-MOL 学术Nucleic Acids Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Calling large indels in 1047 Arabidopsis with IndelEnsembler
Nucleic Acids Research ( IF 16.6 ) Pub Date : 2021-09-30 , DOI: 10.1093/nar/gkab904
Dong-Xu Liu 1, 2 , Ramesh Rajaby 3, 4 , Lu-Lu Wei 1, 2 , Lei Zhang 5 , Zhi-Quan Yang 1, 2 , Qing-Yong Yang 1, 2, 3 , Wing-Kin Sung 1, 2, 3, 6
Affiliation  

Large indels greatly impact the observable phenotypes in different organisms including plants and human. Hence, extracting large indels with high precision and sensitivity is important. Here, we developed IndelEnsembler to detect large indels in 1047 Arabidopsis whole-genome sequencing data. IndelEnsembler identified 34 093 deletions, 12 913 tandem duplications and 9773 insertions. Our large indel dataset was more comprehensive and accurate compared with the previous dataset of AthCNV (1). We captured nearly twice of the ground truth deletions and on average 27% more ground truth duplications compared with AthCNV, though our dataset has less number of large indels compared with AthCNV. Our large indels were positively correlated with transposon elements across the Arabidopsis genome. The non-homologous recombination events were the major formation mechanism of deletions in Arabidopsis genome. The Neighbor joining (NJ) tree constructed based on IndelEnsembler's deletions clearly divided the geographic subgroups of 1047 Arabidopsis. More importantly, our large indels represent a previously unassessed source of genetic variation. Approximately 49% of the deletions have low linkage disequilibrium (LD) with surrounding single nucleotide polymorphisms. Some of them could affect trait performance. For instance, using deletion-based genome-wide association study (DEL-GWAS), the accessions containing a 182-bp deletion in AT1G11520 had delayed flowering time and all accessions in north Sweden had the 182-bp deletion. We also found the accessions with 65-bp deletion in the first exon of AT4G00650 (FRI) flowered earlier than those without it. These two deletions cannot be detected in AthCNV and, interestingly, they do not co-occur in any Arabidopsis thaliana accession. By SNP-GWAS, surrounding SNPs of these two deletions do not correlate with flowering time. This example demonstrated that existing large indel datasets miss phenotypic variations and our large indel dataset filled in the gap.

中文翻译:

使用 IndelEnsembler 在 1047 拟南芥中调用大插入缺失

大插入缺失极大地影响了包括植物和人类在内的不同生物体的可观察表型。因此,以高精度和灵敏度提取大插入缺失非常重要。在这里,我们开发了 IndelEnsembler 来检测 1047 个拟南芥全基因组测序数据中的大插​​入缺失。IndelEnsembler 鉴定了 34 093 个缺失、12 913 个串联重复和 9773 个插入。与之前的 AthCNV (1) 数据集相比,我们的大型 indel 数据集更加全面和准确。与 AthCNV 相比,我们捕获了近两倍的基本事实删除,平均多 27% 的基本事实重复,尽管与 AthCNV 相比,我们的数据集的大插入缺失数量较少。我们的大插入缺失与拟南芥基因组中的转座子元件呈正相关。非同源重组事件是拟南芥基因组缺失的主要形成机制。基于 IndelEnsembler 的缺失构建的邻接 (NJ) 树清楚地划分了 1047 个拟南芥的地理亚群。更重要的是,我们的大插入缺失代表了以前未经评估的遗传变异来源。大约 49% 的缺失与周围的单核苷酸多态性具有低连锁不平衡 (LD)。其中一些可能会影响特质表现。例如,使用基于缺失的全基因组关联研究 (DEL-GWAS),AT1G11520 中包含 182-bp 缺失的种质延迟开花时间,瑞典北部的所有种质都具有 182-bp 缺失。我们还发现在 AT4G00650 (FRI) 的第一个外显子中有 65 bp 缺失的种质开花时间早于没有它的种质。在 AthCNV 中无法检测到这两个缺失,有趣的是,它们不会在任何拟南芥加入中同时出现。通过 SNP-GWAS,这两个缺失的周围 SNP 与开花时间无关。这个例子表明,现有的大型 indel 数据集遗漏了表型变异,而我们的大型 indel 数据集填补了空白。
更新日期:2021-09-30
down
wechat
bug