当前位置: X-MOL 学术Methods › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CHEER: hierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning
Methods ( IF 4.8 ) Pub Date : 2020-05-01 , DOI: 10.1016/j.ymeth.2020.05.018
Jiayu Shang 1 , Yanni Sun 1
Affiliation  

Abstract The fast accumulation of viral metagenomic data has contributed significantly to new RNA virus discovery. However, the short read size, complex composition, and large data size can all make taxonomic analysis difficult. In particular, commonly used alignment-based methods are not ideal choices for detecting new viral species. In this work, we present a novel hierarchical classification model named CHEER, which can conduct read-level taxonomic classification from order to genus for new species. By combining k-mer embedding-based encoding, hierarchically organized CNNs, and carefully trained rejection layer, CHEER is able to assign correct taxonomic labels for reads from new species. We tested CHEER on both simulated and real sequencing data. The results show that CHEER can achieve higher accuracy than popular alignment-based and alignment-free taxonomic assignment tools. The source code, scripts, and pre-trained parameters for CHEER are available via GitHub:https://github.com/KennthShang/CHEER.

中文翻译:

CHEER:通过深度学习对病毒宏基因组数据进行分层分类

摘要 病毒宏基因组数据的快速积累为新的 RNA 病毒的发现做出了重大贡献。然而,短的读长、复杂的组成和大的数据量都会使分类分析变得困难。特别是,常用的基于对齐的方法不是检测新病毒物种的理想选择。在这项工作中,我们提出了一种名为 CHEER 的新型分层分类模型,该模型可以对新物种进行从目到属的阅读级分类学分类。通过结合基于 k-mer 嵌入的编码、分层组织的 CNN 和精心训练的拒绝层,CHEER 能够为来自新物种的读数分配正确的分类标签。我们在模拟和真实测序数据上测试了 CHEER。结果表明,与流行的基于对齐和无对齐的分类分配工具相比,CHEER 可以实现更高的准确性。CHEER 的源代码、脚本和预训练参数可通过 GitHub 获得:https://github.com/KennthShang/CHEER。
更新日期:2020-05-01
down
wechat
bug