当前位置: X-MOL 学术Front. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
EnACP: An Ensemble Learning Model for Identification of Anticancer Peptides.
Frontiers in Genetics ( IF 2.8 ) Pub Date : 2020-06-26 , DOI: 10.3389/fgene.2020.00760
Ruiquan Ge 1 , Guanwen Feng 2 , Xiaoyang Jing 3 , Renfeng Zhang 4 , Pu Wang 5 , Qing Wu 1
Affiliation  

As cancer remains one of the main threats of human life, developing efficient cancer treatments is urgent. Anticancer peptides, which could overcome the significant side effects and poor results of traditional cancer treatments, have become a new potential alternative these years. However, identifying anticancer peptides by experimental methods is time consuming and resource consuming, it is of great significance to develop effective computational tools to quickly and accurately identify potential anticancer peptides from amino acid sequences. For most current computational methods, feature representation plays a key role in their final successes. This study proposes a novel fast and accurate approach to identify anticancer peptides using diversified feature representations and ensemble learning method. For the feature representations, the information is encoded from multidimensional feature spaces, including sequence composition, sequence-order, physicochemical properties, etc. In order to better model the potential relationships of peptides, multiple ensemble classifiers, LightGBMs, are applied to detect the different feature sets at first. Then the obtained multiple outputs are used as inputs of the support vector machine classifier, which effectively identifies anticancer peptides. Experimental results on cross validation and independent test sets demonstrate that our method can achieve better or comparable performances compared with other state-of-the-art methods.



中文翻译:

EnACP:用于识别抗癌肽的集成学习模型。

由于癌症仍然是人类生命的主要威胁之一,因此迫切需要开发有效的癌症治疗方法。可以克服传统癌症治疗的显着副作用和不良结果的抗癌肽,近年来已成为一种新的潜在替代品。然而,通过实验方法鉴定抗癌肽既费时又耗资源,开发有效的计算工具以从氨基酸序列中快速准确地鉴定潜在的抗癌肽具有重要意义。对于大多数当前的计算方法,特征表示在其最终成功中起着关键作用。这项研究提出了一种新颖,快速,准确的方法,可以使用多种特征表示和集成学习方法来鉴定抗癌肽。对于特征表示,信息是从多维特征空间编码的,包括序列组成,序列顺序,理化性质等。为了更好地建模肽的潜在关系,首先使用多个集成分类器LightGBM来检测不同的特征集。然后将获得的多个输出用作支持向量机分类器的输入,该分类器可有效识别抗癌肽。交叉验证和独立测试集的实验结果表明,与其他最新方法相比,我们的方法可以实现更好或相当的性能。首先,将多个集成分类器LightGBM用于检测不同的特征集。然后将获得的多个输出用作支持向量机分类器的输入,该分类器可有效识别抗癌肽。交叉验证和独立测试集的实验结果表明,与其他最新方法相比,我们的方法可以实现更好或相当的性能。首先使用多个集成分类器LightGBM来检测不同的功能集。然后将获得的多个输出用作支持向量机分类器的输入,该分类器可有效识别抗癌肽。交叉验证和独立测试集的实验结果表明,与其他最新方法相比,我们的方法可以实现更好或相当的性能。

更新日期:2020-07-30
down
wechat
bug