当前位置: X-MOL 学术Mol. Ecol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Beyond DNA barcoding: The unrealized potential of genome skim data in sample identification.
Molecular Ecology ( IF 4.5 ) Pub Date : 2020-06-16 , DOI: 10.1111/mec.15507
Kristine Bohmann 1 , Siavash Mirarab 2 , Vineet Bafna 3 , M Thomas P Gilbert 1, 4, 5
Affiliation  

Genetic tools are increasingly used to identify and discriminate between species. One key transition in this process was the recognition of the potential of the ca 658bp fragment of the organelle cytochrome c oxidase I (COI) as a barcode region, which revolutionized animal bioidentification and lead, among others, to the instigation of the Barcode of Life Database (BOLD), containing currently barcodes from >7.9 million specimens. Following this discovery, suggestions for other organellar regions and markers, and the primers with which to amplify them, have been continuously proposed. Most recently, the field has taken the leap from PCR‐based generation of DNA references into shotgun sequencing‐based “genome skimming” alternatives, with the ultimate goal of assembling organellar reference genomes. Unfortunately, in genome skimming approaches, much of the nuclear genome (as much as 99% of the sequence data) is discarded, which is not only wasteful, but can also limit the power of discrimination at, or below, the species level. Here, we advocate that the full shotgun sequence data can be used to assign an identity (that we term for convenience its “DNA‐mark”) for both voucher and query samples, without requiring any computationally intensive pretreatment (e.g. assembly) of reads. We argue that if reference databases are populated with such “DNA‐marks,” it will enable future DNA‐based taxonomic identification to complement, or even replace PCR of barcodes with genome skimming, and we discuss how such methodology ultimately could enable identification to population, or even individual, level.

中文翻译:


超越 DNA 条形码:基因组撇脂数据在样本识别中尚未实现的潜力。



遗传工具越来越多地用于识别和区分物种。这一过程中的一个关键转变是认识到细胞器细胞色素 c 氧化酶 I (COI) 的 ca 658bp 片段作为条形码区域的潜力,这彻底改变了动物生物识别,并导致了生命条形码等的诞生数据库 (BOLD),目前包含超过 790 万个样本的条形码。随着这一发现,人们不断提出对其他细胞器区域和标记以及扩增它们的引物的建议。最近,该领域已经从基于 PCR 的 DNA 参考生成跨越到基于鸟枪测序的“基因组撇取”替代方案,最终目标是组装细胞器参考基因组。不幸的是,在基因组略读方法中,大部分核基因组(多达 99% 的序列数据)被丢弃,这不仅浪费,而且还会限制物种水平或以下水平的区分能力。在这里,我们主张完整的鸟枪序列数据可用于为凭证和查询样本分配身份(为方便起见,我们称之为“DNA 标记”),而不需要任何计算密集型的读取预处理(例如组装)。我们认为,如果参考数据库中充满了这样的“DNA 标记”,它将使得未来基于 DNA 的分类学识别能够补充,甚至用基因组略读取代条形码的 PCR,并且我们讨论了这种方法最终如何能够实现群体识别,甚至个人的水平。
更新日期:2020-07-30
down
wechat
bug