当前位置: X-MOL 学术Comput. Struct. Biotechnol. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Assessment of vector-host-pathogen relationships using data mining and machine learning.
Computational and Structural Biotechnology Journal ( IF 4.4 ) Pub Date : 2020-06-25 , DOI: 10.1016/j.csbj.2020.06.031
Diing D M Agany 1, 2 , Jose E Pietri 3 , Etienne Z Gnimpieba 1, 2
Affiliation  

Infectious diseases, including vector-borne diseases transmitted by arthropods, are a leading cause of morbidity and mortality worldwide. In the era of big data, addressing broad-scale, fundamental questions regarding the complex dynamics of these diseases will increasingly require the integration of diverse datasets to produce new biological knowledge. This review provides a current snapshot of the systematic assessment of the relationships between microbial pathogens, arthropod vectors and mammalian hosts using data mining and machine learning. We employ PRISMA to identify 32 key papers relevant to this topic. Our analysis shows an increasing use of data mining and machine learning tasks and techniques, including prediction, classification, clustering, association rules mining, and deep learning, over the last decade. However, it also reveals a number of critical challenges in applying these to the study of vector-host-pathogen interactions at various systems biology levels. Here, relevant studies, current limitations and future directions are discussed. Furthermore, the quality of data in relevant papers was assessed using the FAIR (Findable, Accessible, Interoperable, Reusable) compliance criteria to evaluate and encourage reproducibility and shareability of research outcomes. Although shortcomings in their application remain, data mining and machine learning have significant potential to break new ground in understanding fundamental aspects of vector-host-pathogen relationships and their application in this field should be encouraged. In particular, while predictive modeling, feature engineering and supervised machine learning are already being used in the field, other data mining and machine learning methods such as deep learning and association rules analysis lag behind and should be implemented in combination with established methods to accelerate hypothesis and knowledge generation in the domain.



中文翻译:


使用数据挖掘和机器学习评估载体-宿主-病原体关系。



传染病,包括节肢动物传播的媒介传播疾病,是全世界发病和死亡的主要原因。在大数据时代,解决有关这些疾病的复杂动态的广泛的基本问题将越来越需要整合不同的数据集以产生新的生物学知识。这篇综述提供了利用数据挖掘和机器学习对微生物病原体、节肢动物媒介和哺乳动物宿主之间关系进行系统评估的最新概况。我们使用 PRISMA 来识别与该主题相关的 32 篇关键论文。我们的分析表明,在过去十年中,数据挖掘和机器学习任务和技术的使用越来越多,包括预测、分类、聚类、关联规则挖掘和深度学习。然而,它也揭示了将这些应用到各种系统生物学水平的载体-宿主-病原体相互作用的研究中的一些关键挑战。在这里,讨论了相关研究、当前的局限性和未来的方向。此外,使用 FAIR(可查找、可访问、可互操作、可重用)合规标准评估相关论文中的数据质量,以评估和鼓励研究成果的可重复性和可共享性。尽管其应用仍然存在缺陷,但数据挖掘和机器学习在理解媒介-宿主-病原体关系的基本方面具有巨大的潜力,可以开辟新天地,并且应鼓励它们在该领域的应用。 特别是,虽然预测建模、特征工程和监督机器学习已经在该领域得到应用,但深度学习和关联规则分析等其他数据挖掘和机器学习方法仍然滞后,应与现有方法结合实施以加速假设以及领域内的知识生成。

更新日期:2020-06-25
down
wechat
bug