当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Systematic evaluation of machine learning methods for identifying human-pathogen protein-protein interactions.
Briefings in Bioinformatics ( IF 6.8 ) Pub Date : 2020-05-27 , DOI: 10.1093/bib/bbaa068
Huaming Chen , Fuyi Li , Lei Wang , Yaochu Jin , Chi-Hung Chi , Lukasz Kurgan , Jiangning Song , Jun Shen

In recent years, high-throughput experimental techniques have significantly enhanced the accuracy and coverage of protein–protein interaction identification, including human–pathogen protein–protein interactions (HP-PPIs). Despite this progress, experimental methods are, in general, expensive in terms of both time and labour costs, especially considering that there are enormous amounts of potential protein-interacting partners. Developing computational methods to predict interactions between human and bacteria pathogen has thus become critical and meaningful, in both facilitating the detection of interactions and mining incomplete interaction maps. In this paper, we present a systematic evaluation of machine learning-based computational methods for human–bacterium protein–protein interactions (HB-PPIs). We first reviewed a vast number of publicly available databases of HP-PPIs and then critically evaluate the availability of these databases. Benefitting from its well-structured nature, we subsequently preprocess the data and identified six bacterium pathogens that could be used to study bacterium subjects in which a human was the host. Additionally, we thoroughly reviewed the literature on ‘host–pathogen interactions’ whereby existing models were summarized that we used to jointly study the impact of different feature representation algorithms and evaluate the performance of existing machine learning computational models. Owing to the abundance of sequence information and the limited scale of other protein-related information, we adopted the primary protocol from the literature and dedicated our analysis to a comprehensive assessment of sequence information and machine learning models. A systematic evaluation of machine learning models and a wide range of feature representation algorithms based on sequence information are presented as a comparison survey towards the prediction performance evaluation of HB-PPIs.

中文翻译:

用于识别人类-病原体蛋白质-蛋白质相互作用的机器学习方法的系统评估。

近年来,高通量实验技术显着提高了蛋白质-蛋白质相互作用鉴定的准确性和覆盖率,包括人类-病原体蛋白质-蛋白质相互作用(HP-PPI)。尽管取得了这一进展,但总的来说,实验方法在时间和劳动力成本方面都很昂贵,特别是考虑到有大量潜在的蛋白质相互作用伙伴。因此,开发计算方法来预测人类和细菌病原体之间的相互作用变得至关重要且有意义,这有助于检测相互作用和挖掘不完整的相互作用图。在本文中,我们对基于机器学习的人类-细菌蛋白质-蛋白质相互作用(HB-PPI)计算方法进行了系统评估。我们首先审查了大量公开可用的 HP-PPI 数据库,然后批判性地评估了这些数据库的可用性。受益于其结构良好的性质,我们随后对数据进行了预处理,并确定了六种细菌病原体,可用于研究以人类为宿主的细菌受试者。此外,我们彻底审查了关于“宿主 - 病原体相互作用”的文献,其中总结了现有模型,我们用来联合研究不同特征表示算法的影响并评估现有机器学习计算模型的性能。由于序列信息丰富,其他蛋白质相关信息规模有限,我们采用了文献中的主要协议,并将我们的分析专门用于对序列信息和机器学习模型的综合评估。对机器学习模型的系统评估和基于序列信息的各种特征表示算法作为对 HB-PPI 预测性能评估的比较调查。
更新日期:2020-05-27
down
wechat
bug