当前位置: X-MOL 学术Genome Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
STAT: a fast, scalable, MinHash-based k-mer tool to assess Sequence Read Archive next-generation sequence submissions
Genome Biology ( IF 12.3 ) Pub Date : 2021-09-20 , DOI: 10.1186/s13059-021-02490-0
Kenneth S Katz 1 , Oleg Shutov 1 , Richard Lapoint 1 , Michael Kimelman 1 , J Rodney Brister 1 , Christopher O'Sullivan 1
Affiliation  

Sequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable k-mer-based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata. We show that our MinHash-based k-mer tool is accurate and scalable, offering reliable criteria for efficient selection of data for further analysis by the scientific community, at once validating submissions while also augmenting sample metadata with reliable, searchable, taxonomic terms.

中文翻译:

STAT:一种快速、可扩展、基于 MinHash 的 k-mer 工具,用于评估 Sequence Read Archive 下一代序列提交

向国家生物技术信息中心提交的序列读取档案通常缺乏有用的元数据,这限制了这些提交的实用性。我们描述了序列分类分析工具 (STAT),这是一种可扩展的基于 k-mer 的工具,用于快速评估提交所固有的分类多样性,独立于元数据。我们表明,我们基于 MinHash 的 k-mer 工具是准确且可扩展的,为有效选择数据以供科学界进一步分析提供可靠的标准,同时验证提交的内容,同时还使用可靠、可搜索的分类术语增加样本元数据。
更新日期:2021-09-20
down
wechat
bug