当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CDKAM: a taxonomic classification tool using discriminative k-mers and approximate matching strategies
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-10-20 , DOI: 10.1186/s12859-020-03777-y
Van-Kien Bui , Chaochun Wei

Current taxonomic classification tools use exact string matching algorithms that are effective to tackle the data from the next generation sequencing technology. However, the unique error patterns in the third generation sequencing (TGS) technologies could reduce the accuracy of these programs. We developed a Classification tool using Discriminative K-mers and Approximate Matching algorithm (CDKAM). This approximate matching method was used for searching k-mers, which included two phases, a quick mapping phase and a dynamic programming phase. Simulated datasets as well as real TGS datasets have been tested to compare the performance of CDKAM with existing methods. We showed that CDKAM performed better in many aspects, especially when classifying TGS data with average length 1000–1500 bases. CDKAM is an effective program with higher accuracy and lower memory requirement for TGS metagenome sequence classification. It produces a high species-level accuracy.

中文翻译:

CDKAM:使用区分性k-mers和近似匹配策略的分类学分类工具

当前的分类学分类工具使用精确的字符串匹配算法,这些算法可以有效地处理来自下一代测序技术的数据。但是,第三代测序(TGS)技术中独特的错误模式可能会降低这些程序的准确性。我们使用区分K-mers和近似匹配算法(CDKAM)开发了分类工具。该近似匹配方法用于搜索k聚体,其包括两个阶段,快速映射阶段和动态编程阶段。已对模拟数据集和实际TGS数据集进行了测试,以将CDKAM的性能与现有方法进行比较。我们表明,CDKAM在许多方面都有较好的表现,特别是在对平均长度为1000-1500个碱基的TGS数据进行分类时。CDKAM是一种有效的程序,对于TGS宏基因组序列分类,它具有较高的准确性和较低的内存需求。它具有很高的物种水平准确性。
更新日期:2020-10-20
down
wechat
bug