当前位置: X-MOL 学术Database J. Biol. Databases Curation › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families
Database: The Journal of Biological Databases and Curation ( IF 5.8 ) Pub Date : 2020-12-15 , DOI: 10.1093/database/baaa108
David Couvin 1 , Wilfried Segretier 2 , Erick Stattner 2 , Nalin Rastogi 1
Affiliation  

Bioinformatic tools are currently being developed to better understand the Mycobacterium tuberculosis complex (MTBC). Several approaches already exist for the identification of MTBC lineages using classical genotyping methods such as mycobacterial interspersed repetitive units—variable number of tandem DNA repeats and spoligotyping-based families. In the recently released SITVIT2 proprietary database of the Institut Pasteur de la Guadeloupe, a large number of spoligotype families were assigned by either manual curation/expertise or using an in-house algorithm. In this study, we present two complementary data-driven approaches allowing fast and precise family prediction from spoligotyping patterns. The first one is based on data transformation and the use of decision tree classifiers. In contrast, the second one searches for a set of simple rules using binary masks through a specifically designed evolutionary algorithm. The comparison with the three main approaches in the field highlighted the good performances of our contributions and the significant runtime gain. Finally, we propose the ‘SpolLineages’ software tool (https://github.com/dcouvin/SpolLineages), which implements these approaches for MTBC spoligotype families’ identification.

中文翻译:

SpolLineages 工具中包含的新方法可用于快速准确地预测结核分枝杆菌复合体 spoligotype 家族

目前正在开发生物信息学工具以更好地了解结核分枝杆菌复杂(MTBC)。已经存在几种使用经典基因分型方法鉴定 MTBC 谱系的方法,例如分枝杆菌散布重复单元 - 可变数量的串联 DNA 重复和基于 spoligotyping 的家族。在最近发布的瓜德罗普巴斯德研究所的 SITVIT2 专有数据库中,大量 spoligotype 家族是通过手动管理/专业知识或使用内部算法分配的。在这项研究中,我们提出了两种互补的数据驱动方法,允许从 spoligotyping 模式快速准确地预测家庭。第一个是基于数据转换和决策树分类器的使用。相比之下,第二个使用二进制掩码通过专门设计的进化算法搜索一组简单的规则。与该领域的三种主要方法的比较突出了我们贡献的良好性能和显着的运行时增益。最后,我们提出了“SpolLineages”软件工具 (https://github.com/dcouvin/SpolLineages),它实现了这些方法来识别 MTBC spoligotype 家族。
更新日期:2020-12-15
down
wechat
bug