当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Identification of infectious disease-associated host genes using machine learning techniques.
BMC Bioinformatics ( IF 3 ) Pub Date : 2019-12-27 , DOI: 10.1186/s12859-019-3317-0
Ranjan Kumar Barman 1, 2 , Anirban Mukhopadhyay 3 , Ujjwal Maulik 2 , Santasabuj Das 1, 4
Affiliation  

BACKGROUND With the global spread of multidrug resistance in pathogenic microbes, infectious diseases emerge as a key public health concern of the recent time. Identification of host genes associated with infectious diseases will improve our understanding about the mechanisms behind their development and help to identify novel therapeutic targets. RESULTS We developed a machine learning techniques-based classification approach to identify infectious disease-associated host genes by integrating sequence and protein interaction network features. Among different methods, Deep Neural Networks (DNN) model with 16 selected features for pseudo-amino acid composition (PAAC) and network properties achieved the highest accuracy of 86.33% with sensitivity of 85.61% and specificity of 86.57%. The DNN classifier also attained an accuracy of 83.33% on a blind dataset and a sensitivity of 83.1% on an independent dataset. Furthermore, to predict unknown infectious disease-associated host genes, we applied the proposed DNN model to all reviewed proteins from the database. Seventy-six out of 100 highly-predicted infectious disease-associated genes from our study were also found in experimentally-verified human-pathogen protein-protein interactions (PPIs). Finally, we validated the highly-predicted infectious disease-associated genes by disease and gene ontology enrichment analysis and found that many of them are shared by one or more of the other diseases, such as cancer, metabolic and immune related diseases. CONCLUSIONS To the best of our knowledge, this is the first computational method to identify infectious disease-associated host genes. The proposed method will help large-scale prediction of host genes associated with infectious-diseases. However, our results indicated that for small datasets, advanced DNN-based method does not offer significant advantage over the simpler supervised machine learning techniques, such as Support Vector Machine (SVM) or Random Forest (RF) for the prediction of infectious disease-associated host genes. Significant overlap of infectious disease with cancer and metabolic disease on disease and gene ontology enrichment analysis suggests that these diseases perturb the functions of the same cellular signaling pathways and may be treated by drugs that tend to reverse these perturbations. Moreover, identification of novel candidate genes associated with infectious diseases would help us to explain disease pathogenesis further and develop novel therapeutics.

中文翻译:

使用机器学习技术鉴定与传染病相关的宿主基因。

背景技术随着多药耐药性在病原微生物中的全球传播,传染病成为近来主要的公共卫生问题。鉴定与传染病相关的宿主基因将增进我们对它们发展背后的机制的了解,并有助于鉴定新的治疗靶标。结果我们开发了一种基于机器学习技术的分类方法,通过整合序列和蛋白质相互作用网络特征来鉴定与传染病相关的宿主基因。在不同的方法中,具有16种选定的伪氨基酸成分(PAAC)特征和网络特性的深度神经网络(DNN)模型实现了86.33%的最高准确度,灵敏度为85.61%,特异性为86.57%。DNN分类器也达到了83的精度。盲数据集的敏感度为33%,独立数据集的敏感度为83.1%。此外,为了预测未知的与传染病相关的宿主基因,我们将提出的DNN模型应用于数据库中所有已审查的蛋白质。从我们的研究中,在100个高度预测的与传染病相关的基因中,有76个是在经过实验验证的人-病原体蛋白质-蛋白质相互作用(PPI)中发现的。最后,我们通过疾病和基因本体论富集分析验证了高度预测的与传染病相关的基因,并发现其中许多是与一种或多种其他疾病(如癌症,代谢性疾病和免疫相关性疾病)共享的。结论据我们所知,这是鉴定与传染病相关的宿主基因的第一种计算方法。所提出的方法将有助于大规模预测与传染病相关的宿主基因。但是,我们的结果表明,对于小型数据集,基于DNN的高级方法相对于诸如支持向量机(SVM)或随机森林(RF)之类的更简单的监督式机器学习技术,并未提供明显的优势来预测与传染病相关的疾病宿主基因。传染病与癌症和代谢性疾病在疾病和基因本体论富集分析上的重大重叠表明,这些疾病会干扰相同细胞信号通路的功能,并可能通过倾向于逆转这些干扰的药物进行治疗。而且,
更新日期:2019-12-30
down
wechat
bug