当前位置: X-MOL 学术Curr. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Identification of Cancerlectins By Using Cascade Linear Discriminant Analysis and Optimal g-gap Tripeptide Composition
Current Bioinformatics ( IF 4 ) Pub Date : 2020-06-30 , DOI: 10.2174/1574893614666190730103156
Liangwei Yang 1 , Hui Gao 1 , Keyu Wu 2 , Haotian Zhang 2 , Changyu Li 2 , Lixia Tang 2
Affiliation  

Background: Lectins are a diverse group of glycoproteins or glycoconjugate proteins that can be extracted from plants, invertebrates and higher animals. Cancerlectins, a kind of lectins, which play a key role in the process of tumor cells interacting with each other and are being employed as therapeutic agents. A full understanding of cancerlectins is significant because it provides a tool for the future direction of cancer therapy.

Objective: To develop an accurate and practically useful timesaving tool to identify cancerlectins. A novel sequence-based method is proposed along with a correlative webserver to access the proposed tool.

Methods: Firstly, protein features were extracted in a newly feature building way termed, g-gap tripeptide composition. After which a proposed cascade linear discriminant analysis (Cascade LDA) is used to alleviate the high dimensional difficulties with the Analysis Of Variance (ANOVA) as a feature importance criterion. Finally, Support Vector Machine (SVM) is used as the classifier to identify cancerlectins.

Results: The proposed method achieved an accuracy of 91.34% with sensitivity of 89.89%, specificity of 92.48% and an 0.8318 Mathew’s correlation coefficient based on only 13 fusion features in jackknife cross validation, the result of which is superior to other published methods in this domain.

Conclusion: In this study, a new method based only on primary structure of protein is proposed and experimental results show that it could be a promising tool to identify cancerlectins. An openaccess webserver is made available in this work to facilitate other related works.



中文翻译:

级联线性判别分析和最佳g-gap三肽组成鉴定癌菌素

背景:凝集素是可从植物,无脊椎动物和高等动物中提取的多种糖蛋白或糖缀合物蛋白。癌性凝集素是一种凝集素,在肿瘤细胞彼此相互作用的过程中起关键作用,并被用作治疗剂。充分了解抗癌素很重要,因为它为癌症治疗的未来方向提供了一种工具。

目的:开发一种准确,实用的省时工具来鉴定癌菌凝集素。提出了一种基于序列的新颖方法以及相关的Web服务器,以访问提出的工具。

方法:首先,以一种称为g-gap三肽组合物的新特征构建方式提取蛋白质特征。之后,以方差分析(ANOVA)作为特征重要性标准,使用拟议的级联线性判别分析(Cascade LDA)缓解高维困难。最后,使用支持向量机(SVM)作为分类器来鉴定癌菌素。

结果:该方法仅通过13个融合特征在折刀交叉验证中获得了91.34%的准确度,89.89%的灵敏度,92.48%的特异性和0.8318的Mathew相关系数,其结果优于本方法中的其他已发表方法。域。

结论:本研究提出了一种仅基于蛋白质一级结构的新方法,实验结果表明该方法可以作为一种鉴定癌铁蛋白的有前途的工具。在这项工作中提供了一个openaccess网络服务器,以促进其他相关工作。

更新日期:2020-06-30
down
wechat
bug