当前位置: X-MOL 学术Comput. Math. Method Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Method for Identifying Vesicle Transport Proteins Based on LibSVM and MRMD
Computational and Mathematical Methods in Medicine ( IF 2.809 ) Pub Date : 2020-10-19 , DOI: 10.1155/2020/8926750
Zhiyu Tao 1 , Yanjuan Li 1 , Zhixia Teng 1 , Yuming Zhao 1
Affiliation  

With the development of computer technology, many machine learning algorithms have been applied to the field of biology, forming the discipline of bioinformatics. Protein function prediction is a classic research topic in this subject area. Though many scholars have made achievements in identifying protein by different algorithms, they often extract a large number of feature types and use very complex classification methods to obtain little improvement in the classification effect, and this process is very time-consuming. In this research, we attempt to utilize as few features as possible to classify vesicular transportation proteins and to simultaneously obtain a comparative satisfactory classification result. We adopt CTDC which is a submethod of the method of composition, transition, and distribution (CTD) to extract only 39 features from each sequence, and LibSVM is used as the classification method. We use the SMOTE method to deal with the problem of dataset imbalance. There are 11619 protein sequences in our dataset. We selected 4428 sequences to train our classification model and selected other 1832 sequences from our dataset to test the classification effect and finally achieved an accuracy of 71.77%. After dimension reduction by MRMD, the accuracy is 72.16%.

中文翻译:

基于LibSVM和MRMD的囊泡转运蛋白鉴定方法

随着计算机技术的发展,许多机器学习算法已经应用于生物学领域,形成了生物信息学的学科。蛋白质功能预测是该领域的经典研究课题。尽管许多学者通过不同的算法在蛋白质识别方面都取得了成就,但他们经常提取大量特征类型并使用非常复杂的分类方法,而对分类效果的改善却很少,而且此过程非常耗时。在这项研究中,我们尝试利用尽可能少的特征对水泡转运蛋白进行分类,并同时获得比较令人满意的分类结果。我们采用CTDC,它是组成,转换,分布(CTD)来从每个序列中仅提取39个特征,并且LibSVM被用作分类方法。我们使用SMOTE方法来解决数据集不平衡的问题。我们的数据集中有11619个蛋白质序列。我们选择了4428个序列来训练我们的分类模型,并从数据集中选择其他1832个序列来测试分类效果,最终达到了71.77%的准确性。经MRMD缩减尺寸后,精度为72.16%。
更新日期:2020-10-19
down
wechat
bug