当前位置: X-MOL 学术Plant Methods › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Large scale genome skimming from herbarium material for accurate plant identification and phylogenomics.
Plant Methods ( IF 4.7 ) Pub Date : 2020-01-04 , DOI: 10.1186/s13007-019-0534-5
Paul G Nevill 1, 2, 3 , Xiao Zhong 4, 5 , Julian Tonti-Filippini 4, 5 , Margaret Byrne 2, 6, 7 , Michael Hislop 6 , Kevin Thiele 2, 6 , Stephen van Leeuwen 6 , Laura M Boykin 4, 5 , Ian Small 4, 5
Affiliation  

Background Herbaria are valuable sources of extensive curated plant material that are now accessible to genetic studies because of advances in high-throughput, next-generation sequencing methods. As an applied assessment of large-scale recovery of plastid and ribosomal genome sequences from herbarium material for plant identification and phylogenomics, we sequenced 672 samples covering 21 families, 142 genera and 530 named and proposed named species. We explored the impact of parameters such as sample age, DNA concentration and quality, read depth and fragment length on plastid assembly error. We also tested the efficacy of DNA sequence information for identifying plant samples using 45 specimens recently collected in the Pilbara. Results Genome skimming was effective at producing genomic information at large scale. Substantial sequence information on the chloroplast genome was obtained from 96.1% of samples, and complete or near-complete sequences of the nuclear ribosomal RNA gene repeat were obtained from 93.3% of samples. We were able to extract sequences for the core DNA barcode regions rbcL and matK from 96 to 93.3% of samples, respectively. Read quality and DNA fragment length had significant effects on sequencing outcomes and error correction of reads proved essential. Assembly problems were specific to certain taxa with low GC and high repeat content (Goodenia, Scaevola, Cyperus, Bulbostylis, Fimbristylis) suggesting biological rather than technical explanations. The structure of related genomes was needed to guide the assembly of repeats that exceeded the read length. DNA-based matching proved highly effective and showed that the efficacy for species identification declined in the order cpDNA >> rDNA > matK >> rbcL. Conclusions We showed that a large-scale approach to genome sequencing using herbarium specimens produces high-quality complete cpDNA and rDNA sequences as a source of data for DNA barcoding and phylogenomics.

中文翻译:

从植物标本馆材料中进行大规模基因组略读,以进行准确的植物鉴定和系统基因组学。

背景 由于高通量、下一代测序方法的进步,植物标本馆是广泛的精选植物材料的宝贵来源,现在可用于遗传研究。作为从植物标本室材料中大规模回收质体和核糖体基因组序列以进行植物鉴定和系统基因组学的应用评估,我们对 672 个样本进行了测序,涵盖 21 个科、142 个属和 530 个命名和建议命名的物种。我们探讨了样本年龄、DNA 浓度和质量、读取深度和片段长度等参数对质体组装错误的影响。我们还使用最近在皮尔巴拉收集的 45 个样本测试了 DNA 序列信息在鉴定植物样本方面的功效。结果基因组略读在大规模产生基因组信息方面是有效的。从 96.1% 的样本中获得了关于叶绿体基因组的大量序列信息,从 93.3% 的样本中获得了完整或接近完整的核糖体 RNA 基因重复序列。我们能够分别从 96% 到 93.3% 的样本中提取核心 DNA 条形码区域 rbcL 和 matK 的序列。读数质量和 DNA 片段长度对测序结果有显着影响,并且读数的纠错被证明是必不可少的。装配问题是特定于某些具有低 GC 和高重复含量的分类群(Goodenia、Scaevola、Cyperus、Bulbostylis、Fimbristylis),表明生物学而非技术解释。需要相关基因组的结构来指导超过读取长度的重复序列的组装。基于 DNA 的匹配被证明是非常有效的,并且表明物种鉴定的功效以 cpDNA >> rDNA > matK >> rbcL 的顺序下降。结论 我们表明,使用植物标本进行基因组测序的大规模方法可以产生高质量的完整 cpDNA 和 rDNA 序列,作为 DNA 条形码和系统基因组学的数据来源。
更新日期:2020-01-04
down
wechat
bug