DEEPSMP: A deep learning model for predicting the ectodomain shedding events of membrane proteins,Journal of Bioinformatics and Computational Biology

当前位置： X-MOL 学术 › J. Bioinform. Comput. Biol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DEEPSMP: A deep learning model for predicting the ectodomain shedding events of membrane proteins
Journal of Bioinformatics and Computational Biology ( IF 0.9 ) Pub Date : 2020-03-30 , DOI: 10.1142/s0219720020500171
Zhongbo Cao _{1,

2} , Wei Du ₁ , Gaoyang Li ₁ , Huansheng Cao ₃

Affiliation

Membrane proteins play essential roles in modern medicine. In recent studies, some membrane proteins involved in ectodomain shedding events have been reported as the potential drug targets and biomarkers of some serious diseases. However, there are few effective tools for identifying the shedding event of membrane proteins. So, it is necessary to design an effective tool for predicting shedding event of membrane proteins. In this study, we design an end-to-end prediction model using deep neural networks with long short-term memory (LSTM) units and attention mechanism, to predict the ectodomain shedding events of membrane proteins only by sequence information. Firstly, the evolutional profiles are encoded from original sequences of these proteins by Position-Specific Iterated BLAST (PSI-BLAST) on Uniref50 database. Then, the LSTM units which contain memory cells are used to hold information from past inputs to the network and the attention mechanism is applied to detect sorting signals in proteins regardless of their position in the sequence. Finally, a fully connected dense layer and a softmax layer are used to obtain the final prediction results. Additionally, we also try to reduce overfitting of the model by using dropout, L2 regularization, and bagging ensemble learning in the model training process. In order to ensure the fairness of performance comparison, firstly we use cross validation process on training dataset obtained from an existing paper. The average accuracy and area under a receiver operating characteristic curve (AUC) of five-fold cross-validation are 81.19% and 0.835 using our proposed model, compared to 75% and 0.78 by a previously published tool, respectively. To better validate the performance of the proposed model, we also evaluate the performance of the proposed model on independent test dataset. The accuracy, sensitivity, and specificity are 83.14%, 84.08%, and 81.63% using our proposed model, compared to 70.20%, 71.97%, and 67.35% by the existing model. The experimental results validate that the proposed model can be regarded as a general tool for predicting ectodomain shedding events of membrane proteins. The pipeline of the model and prediction results can be accessed at the following URL: http://www.csbg-jlu.info/DeepSMP/ .

中文翻译：

DEEPSMP：预测膜蛋白胞外域脱落事件的深度学习模型

膜蛋白在现代医学中发挥着重要作用。在最近的研究中，一些参与胞外域脱落事件的膜蛋白已被报道为一些严重疾病的潜在药物靶点和生物标志物。然而，几乎没有有效的工具来识别膜蛋白的脱落事件。因此，有必要设计一种有效的预测膜蛋白脱落事件的工具。在这项研究中，我们使用具有长短期记忆（LSTM）单元和注意机制的深度神经网络设计了一个端到端预测模型，仅通过序列信息来预测膜蛋白的胞外域脱落事件。首先，进化谱是通过 Uniref50 数据库上的位置特异性迭代 BLAST (PSI-BLAST) 从这些蛋白质的原始序列编码的。然后，包含记忆单元的 LSTM 单元用于保存来自过去输入到网络的信息，注意力机制用于检测蛋白质中的排序信号，无论它们在序列中的位置如何。最后，使用一个全连接的dense layer和一个softmax layer来得到最终的预测结果。此外，我们还尝试通过在模型训练过程中使用 dropout、L2 正则化和 bagging 集成学习来减少模型的过度拟合。为了确保性能比较的公平性，我们首先对从现有论文中获得的训练数据集使用交叉验证过程。使用我们提出的模型，五重交叉验证的接收者操作特征曲线 (AUC) 下的平均准确度和面积分别为 81.19% 和 0.835，而分别为 75% 和 0。78 分别由先前发布的工具提供。为了更好地验证所提出模型的性能，我们还评估了所提出模型在独立测试数据集上的性能。使用我们提出的模型的准确性、敏感性和特异性分别为 83.14%、84.08% 和 81.63%，而现有模型的准确性、敏感性和特异性分别为 70.20%、71.97% 和 67.35%。实验结果验证了所提出的模型可以被视为预测膜蛋白胞外域脱落事件的通用工具。可以通过以下 URL 访问模型的管道和预测结果：http://www.csbg-jlu.info/DeepSMP/。和 81.63% 使用我们提出的模型，而现有模型分别为 70.20%、71.97% 和 67.35%。实验结果验证了所提出的模型可以被视为预测膜蛋白胞外域脱落事件的通用工具。可以通过以下 URL 访问模型的管道和预测结果：http://www.csbg-jlu.info/DeepSMP/。和 81.63% 使用我们提出的模型，而现有模型分别为 70.20%、71.97% 和 67.35%。实验结果验证了所提出的模型可以被视为预测膜蛋白胞外域脱落事件的通用工具。可以通过以下 URL 访问模型的管道和预测结果：http://www.csbg-jlu.info/DeepSMP/。

更新日期：2020-03-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11