当前位置: X-MOL 学术Brief. Funct. Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sequence representation approaches for sequence-based protein prediction tasks that use deep learning
Briefings in Functional Genomics ( IF 2.5 ) Pub Date : 2021-02-02 , DOI: 10.1093/bfgp/elaa030
Feifei Cui 1 , Zilong Zhang 1 , Quan Zou 2
Affiliation  

Deep learning has been increasingly used in bioinformatics, especially in sequence-based protein prediction tasks, as large amounts of biological data are available and deep learning techniques have been developed rapidly in recent years. For sequence-based protein prediction tasks, the selection of a suitable model architecture is essential, whereas sequence data representation is a major factor in controlling model performance. Here, we summarized all the main approaches that are used to represent protein sequence data (amino acid sequence encoding or embedding), which include end-to-end embedding methods, non-contextual embedding methods and embedding methods that use transfer learning and others that are applied for some specific tasks (such as protein sequence embedding based on extracted features for protein structure predictions and graph convolutional network-based embedding for drug discovery tasks). We have also reviewed the architectures of various types of embedding models theoretically and the development of these types of sequence embedding approaches to facilitate researchers and users in selecting the model that best suits their requirements.

中文翻译:

使用深度学习的基于序列的蛋白质预测任务的序列表示方法

深度学习越来越多地用于生物信息学,特别是在基于序列的蛋白质预测任务中,因为有大量的生物数据可用,并且近年来深度学习技术得到了迅速发展。对于基于序列的蛋白质预测任务,选择合适的模型架构至关重要,而序列数据表示是控制模型性能的主要因素。在这里,我们总结了用于表示蛋白质序列数据(氨基酸序列编码或嵌入)的所有主要方法,包括端到端嵌入方法,非上下文嵌入方法和使用转移学习的嵌入方法以及其他应用于某些特定任务的嵌入方法(例如基于提取特征的蛋白质序列嵌入用于蛋白质结构预测和基于图形卷积网络的嵌入用于药物发现任务)。我们还从理论上回顾了各种类型的嵌入模型的架构,以及这些类型的序列嵌入方法的发展,以方便研究人员和用户选择最适合他们需求的模型。
更新日期:2021-03-03
down
wechat
bug