当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Deep Learning Framework for Gene Ontology Annotations With Sequence- and Network-Based Information
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 4.5 ) Pub Date : 2020-01-23 , DOI: 10.1109/tcbb.2020.2968882
Fuhao Zhang , Hong Song , Min Zeng , Fangxiang Wu , Yaohang Li , Yi Pan , Min Li

Knowledge of protein functions plays an important role in biology and medicine. With the rapid development of high-throughput technologies, a huge number of proteins have been discovered. However, there are a great number of proteins without functional annotations. A protein usually has multiple functions and some functions or biological processes require interactions of a plurality of proteins. Additionally, Gene Ontology provides a useful classification for protein functions and contains more than 40,000 terms. We propose a deep learning framework called DeepGOA to predict protein functions with protein sequences and protein-protein interaction (PPI) networks. For protein sequences, we extract two types of information: sequence semantic information and subsequence-based features. We use the word2vec technique to numerically represent protein sequences, and utilize a Bi-directional Long and Short Time Memory (Bi-LSTM) and multi-scale convolutional neural network (multi-scale CNN) to obtain the global and local semantic features of protein sequences, respectively. Additionally, we use the InterPro tool to scan protein sequences for extracting subsequence-based information, such as domains and motifs. Then, the information is plugged into a neural network to generate high-quality features. For the PPI network, the Deepwalk algorithm is applied to generate its embedding information of PPI. Then the two types of features are concatenated together to predict protein functions. To evaluate the performance of DeepGOA, several different evaluation methods and metrics are utilized. The experimental results show that DeepGOA outperforms DeepGO and BLAST.

中文翻译:

具有基于序列和网络的信息的基因本体注释的深度学习框架

蛋白质功能的知识在生物学和医学中起着重要作用。随着高通量技术的快速发展,大量的蛋白质被发现。然而,有大量的蛋白质没有功能注释。一种蛋白质通常具有多种功能,并且某些功能或生物过程需要多种蛋白质的相互作用。此外,Gene Ontology 为蛋白质功能提供了有用的分类,包含 40,000 多个术语。我们提出了一个名为 DeepGOA 的深度学习框架,通过蛋白质序列和蛋白质-蛋白质相互作用 (PPI) 网络来预测蛋白质功能。对于蛋白质序列,我们提取两种类型的信息:序列语义信息和基于子序列的特征。我们使用 word2vec 技术对蛋白质序列进行数值表示,并利用双向长短时记忆 (Bi-LSTM) 和多尺度卷积神经网络 (multi-scale CNN) 获得蛋白质的全局和局部语义特征序列,分别。此外,我们使用 InterPro 工具扫描蛋白质序列以提取基于子序列的信息,例如结构域和基序。然后,将信息插入神经网络以生成高质量的特征。对于 PPI 网络,应用 Deepwalk 算法生成其 PPI 的嵌入信息。然后将两种类型的特征连接在一起以预测蛋白质功能。为了评估 DeepGOA 的性能,使用了几种不同的评估方法和指标。
更新日期:2020-01-23
down
wechat
bug