当前位置: X-MOL 学术SAR QSAR Environ. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prediction of apoptosis protein subcellular localization via heterogeneous features and hierarchical extreme learning machine.
SAR and QSAR in Environmental Research ( IF 3 ) Pub Date : 2019-02-26 , DOI: 10.1080/1062936x.2019.1576222
S Zhang 1 , T Zhang 1 , C Liu 2
Affiliation  

Apoptosis is a fundamental process controlling normal tissue homeostasis by regulating a balance between cell proliferation and death. Predicting the subcellular location of apoptosis proteins is very helpful for understanding the mechanism of programmed cell death. Predicting protein subcellular localization with bioinformatic techniques provides quite a few opportunities in related fields. In this work, we propose the use of a hierarchical extreme learning machine (H-ELM) to make a classification of high-dimensional input data without demanding a dimension reduction process, which yields acceptable results. An attempt is made to extract features from different perspectives, and a feature fusion process is accomplished. Regarding the position-specific scoring matrix, the first type depicts the correlation within the sequence with the autocorrelation function for relatively random sections from the sequence; and the second type is the Kullback-Leibler (K-L) divergence of the two distributions formed by the amino acids’ constitutuent proportions. It is illustrated in an experiment with features from different sources mixed by simple concatenation yielding a poor result, but the synthetical feature fused with stochastic nonlinear embedding (t-SNE) greatly improved the classification. Finally, the highest overall accuracy of ZD98 is 87.5% by adjusting the hyper-parameters of H-ELM, and of CL317 is 92.4%.



中文翻译:

通过异质特征和分级极端学习机预测凋亡蛋白亚细胞定位。

凋亡是通过调节细胞增殖与死亡之间的平衡来控制正常组织稳态的基本过程。预测凋亡蛋白的亚细胞位置对于理解程序性细胞死亡的机制非常有帮助。用生物信息学技术预测蛋白质亚细胞定位在相关领域提供了很多机会。在这项工作中,我们建议使用分层的极限学习机(H-ELM)来对高维输入数据进行分类,而无需进行降维处理,这会产生可接受的结果。尝试从不同的角度提取特征,并完成了特征融合过程。关于特定位置的得分矩阵,第一种类型描述了序列中具有自相关函数的序列中的相关性,用于序列中相对随机的部分。第二种类型是由氨基酸的组成部分组成的两个分布的Kullback-Leibler(KL)散度。在一个实验中对此进行了说明,将来自不同来源的特征通过简单的串联混合产生了较差的结果,但是融合了随机非线性嵌入(t-SNE)的综合特征大大改善了分类。最后,通过调整H-ELM的超参数,ZD98的最高总体精度为87.5%,而CL317的最高总体精度为92.4%。在一个实验中对此进行了说明,将来自不同来源的特征通过简单的串联混合产生了较差的结果,但是融合了随机非线性嵌入(t-SNE)的综合特征大大改善了分类。最后,通过调整H-ELM的超参数,ZD98的最高总体精度为87.5%,而CL317的最高总体精度为92.4%。在一个实验中对此进行了说明,将来自不同来源的特征通过简单的串联混合产生了较差的结果,但是融合了随机非线性嵌入(t-SNE)的综合特征大大改善了分类。最后,通过调整H-ELM的超参数,ZD98的最高总体精度为87.5%,而CL317的最高总体精度为92.4%。

更新日期:2019-02-26
down
wechat
bug