当前位置: X-MOL 学术J. Comput. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Integrating Low-Order and High-Order Correlation Information for Identifying Phage Virion Proteins.
Journal of Computational Biology ( IF 1.4 ) Pub Date : 2023-09-20 , DOI: 10.1089/cmb.2022.0237
Hongliang Zou 1 , Wanting Yu 2
Affiliation  

Phage virion proteins (PVPs) play an important role in the host cell. Fast and accurate identification of PVPs is beneficial for the discovery and development of related drugs. Although wet experimental approaches are the first choice to identify PVPs, they are costly and time-consuming. Thus, researchers have turned their attention to computational models, which can speed up related studies. Therefore, we proposed a novel machine-learning model to identify PVPs in the current study. First, 50 different types of physicochemical properties were used to denote protein sequences. Next, two different approaches, including Pearson's correlation coefficient (PCC) and maximal information coefficient (MIC), were employed to extract discriminative information. Further, to capture the high-order correlation information, we used PCC and MIC once again. After that, we adopted the least absolute shrinkage and selection operator algorithm to select the optimal feature subset. Finally, these chosen features were fed into a support vector machine to discriminate PVPs from phage non-virion proteins. We performed experiments on two different datasets to validate the effectiveness of our proposed method. Experimental results showed a significant improvement in performance compared with state-of-the-art approaches. It indicates that the proposed computational model may become a powerful predictor in identifying PVPs.

中文翻译:

整合低阶和高阶相关信息来识别噬菌体病毒粒子蛋白。

噬菌体病毒体蛋白(PVP)在宿主细胞中发挥着重要作用。快速、准确地鉴定PVPs有利于相关药物的发现和开发。尽管湿实验方法是鉴定 PVP 的首选,但其成本高昂且耗时。因此,研究人员将注意力转向了计算模型,这可以加快相关研究的速度。因此,我们在当前的研究中提出了一种新颖的机器学习模型来识别 PVP。首先,使用 50 种不同类型的理化性质来表示蛋白质序列。接下来,采用两种不同的方法,包括皮尔逊相关系数(PCC)和最大信息系数(MIC)来提取判别信息。此外,为了捕获高阶相关信息,我们再次使用 PCC 和 MIC。之后,我们采用最小绝对收缩和选择算子算法来选择最优特征子集。最后,将这些选定的特征输入支持向量机以区分 PVP 和噬菌体非病毒体蛋白。我们在两个不同的数据集上进行了实验,以验证我们提出的方法的有效性。实验结果表明,与最先进的方法相比,性能有了显着提高。这表明所提出的计算模型可能成为识别 PVP 的强大预测器。
更新日期:2023-09-20
down
wechat
bug