当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
mCNN-ETC: identifying electron transporters and their functional families by using multiple windows scanning techniques in convolutional neural networks with evolutionary information of protein sequences
Briefings in Bioinformatics ( IF 6.8 ) Pub Date : 2021-08-11 , DOI: 10.1093/bib/bbab352
Quang-Thai Ho, Nguyen Quoc Khanh Le, Yu-Yen Ou

In the past decade, convolutional neural networks (CNNs) have been used as powerful tools by scientists to solve visual data tasks. However, many efforts of convolutional neural networks in solving protein function prediction and extracting useful information from protein sequences have certain limitations. In this research, we propose a new method to improve the weaknesses of the previous method. mCNN-ETC is a deep learning model which can transform the protein evolutionary information into image-like data composed of 20 channels, which correspond to the 20 amino acids in the protein sequence. We constructed CNN layers with different scanning windows in parallel to enhance the useful pattern detection ability of the proposed model. Then we filtered specific patterns through the 1-max pooling layer before inputting them into the prediction layer. This research attempts to solve a basic problem in biology in terms of application: predicting electron transporters and classifying their corresponding complexes. The performance result reached an accuracy of 97.41%, which was nearly 6% higher than its predecessor. We have also published a web server on http://bio219.bioinfo.yzu.edu.tw, which can be used for research purposes free of charge.

中文翻译:

mCNN-ETC:通过在具有蛋白质序列进化信息的卷积神经网络中使用多窗口扫描技术来识别电子转运蛋白及其功能家族

在过去十年中,卷积神经网络 (CNN) 已被科学家用作解决视觉数据任务的强大工具。然而,卷积神经网络在解决蛋白质功能预测和从蛋白质序列中提取有用信息方面的许多努力都存在一定的局限性。在这项研究中,我们提出了一种新方法来改进以前方法的弱点。mCNN-ETC是一种深度学习模型,可以将蛋白质进化信息转化为由20个通道组成的类图像数据,对应蛋白质序列中的20个氨基酸。我们并行构建了具有不同扫描窗口的 CNN 层,以增强所提出模型的有用模式检测能力。然后我们通过 1-max 池化层过滤特定模式,然后将它们输入到预测层。本研究试图从应用的角度解决生物学中的一个基本问题:预测电子转运蛋白并对其对应的复合物进行分类。性能结果达到了 97.41% 的准确率,比其前身提高了近 6%。我们还在 http://bio219.bioinfo.yzu.edu.tw 上发布了一个网络服务器,可以免费用于研究目的。
更新日期:2021-08-11
down
wechat
bug