当前位置: X-MOL 学术Gigascience › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Binning unassembled short reads based on k-mer abundance covariance using sparse coding.
GigaScience ( IF 11.8 ) Pub Date : 2020-04-01 , DOI: 10.1093/gigascience/giaa028
Olexiy Kyrgyzov 1 , Vincent Prost 1, 2 , Stéphane Gazut 2 , Bruno Farcy 3 , Thomas Brüls 1
Affiliation  

BACKGROUND Sequence-binning techniques enable the recovery of an increasing number of genomes from complex microbial metagenomes and typically require prior metagenome assembly, incurring the computational cost and drawbacks of the latter, e.g., biases against low-abundance genomes and inability to conveniently assemble multi-terabyte datasets. RESULTS We present here a scalable pre-assembly binning scheme (i.e., operating on unassembled short reads) enabling latent genome recovery by leveraging sparse dictionary learning and elastic-net regularization, and its use to recover hundreds of metagenome-assembled genomes, including very low-abundance genomes, from a joint analysis of microbiomes from the LifeLines DEEP population cohort (n = 1,135, >1010 reads). CONCLUSION We showed that sparse coding techniques can be leveraged to carry out read-level binning at large scale and that, despite lower genome reconstruction yields compared to assembly-based approaches, bin-first strategies can complement the more widely used assembly-first protocols by targeting distinct genome segregation profiles. Read enrichment levels across 6 orders of magnitude in relative abundance were observed, indicating that the method has the power to recover genomes consistently segregating at low levels.

中文翻译:


使用稀疏编码基于 k-mer 丰度协方差对未组装的短读进行分箱。



背景技术序列分箱技术能够从复杂的微生物宏基因组中恢复越来越多的基因组,并且通常需要事先进行宏基因组组装,从而导致计算成本和后者的缺点,例如,对低丰度基因组的偏见以及无法方便地组装多个宏基因组。 TB 数据集。结果我们在这里提出了一种可扩展的预组装分箱方案(即,对未组装的短读操作进行操作),通过利用稀疏字典学习和弹性网络正则化来实现潜在基因组恢复,并使用它来恢复数百个宏基因组组装的基因组,包括非常低的基因组。 -丰富的基因组,来自对 LifeLines DEEP 人群队列微生物组的联合分析(n = 1,135,>1010 个读数)。结论 我们表明,稀疏编码技术可用于大规模执行读取级别分箱,并且尽管与基于组装的方法相比,基因组重建产量较低,但分箱优先策略可以通过以下方式补充更广泛使用的组装优先协议:针对不同的基因组分离谱。观察到相对丰度跨越 6 个数量级的 Read 富集水平,表明该方法有能力恢复低水平持续分离的基因组。
更新日期:2020-04-17
down
wechat
bug