当前位置: X-MOL 学术Proteins Struct. Funct. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
cnnAlpha: Protein disordered regions prediction by reduced amino acid alphabets and convolutional neural networks.
Proteins: Structure, Function, and Bioinformatics ( IF 2.9 ) Pub Date : 2020-06-14 , DOI: 10.1002/prot.25966
Mauricio Oberti 1, 2 , Iosif I Vaisman 1
Affiliation  

Intrinsically disordered regions (IDR) play an important role in key biological processes and are closely related to human diseases. IDRs have great potential to serve as targets for drug discovery, most notably in disordered binding regions. Accurate prediction of IDRs is challenging because their genome wide occurrence and a low ratio of disordered residues make them difficult targets for traditional classification techniques. Existing computational methods mostly rely on sequence profiles to improve accuracy which is time consuming and computationally expensive. This article describes an ab initio sequence‐only prediction method—which tries to overcome the challenge of accurate prediction posed by IDRs—based on reduced amino acid alphabets and convolutional neural networks (CNNs). We experiment with six different 3‐letter reduced alphabets. We argue that the dimensional reduction in the input alphabet facilitates the detection of complex patterns within the sequence by the convolutional step. Experimental results show that our proposed IDR predictor performs at the same level or outperforms other state‐of‐the‐art methods in the same class, achieving accuracy levels of 0.76 and AUC of 0.85 on the publicly available Critical Assessment of protein Structure Prediction dataset (CASP10). Therefore, our method is suitable for proteome‐wide disorder prediction yielding similar or better accuracy than existing approaches at a faster speed.

中文翻译:

cnnAlpha:通过减少的氨基酸字母和卷积神经网络预测蛋白质无序区域。

固有无序区(IDR)在关键的生物学过程中起着重要作用,并且与人类疾病密切相关。IDR具有巨大的潜力,可以作为药物发现的靶标,最明显的是在无序的结合区域。IDR的准确预测具有挑战性,因为它们的基因组范围广泛且残基比率低使得它们成为传统分类技术的目标。现有的计算方法主要依靠序列简档来提高准确性,这既费时又费钱。本文介绍了一种从头算序列唯一的预测方法,该方法试图克服基于简化的氨基酸字母和卷积神经网络(CNN)的IDR带来的准确预测的挑战。我们尝试了六个不同的3字母缩略字母。我们认为,输入字母的降维处理有助于通过卷积步骤检测序列中的复杂模式。实验结果表明,我们提出的IDR预测器具有相同的水平或优于同类的其他最新方法,在可公开获得的蛋白质结构预测数据集的关键评估中达到0.76的准确度和0.85的AUC( CASP10)。因此,我们的方法适用于蛋白质组范围内的疾病预测,与现有方法相比,其准确度更高或具有更快的速度。实验结果表明,我们提出的IDR预测器具有相同的水平或优于同类的其他最新方法,在可公开获得的蛋白质结构预测数据集的关键评估中达到0.76的准确度和0.85的AUC( CASP10)。因此,我们的方法适用于蛋白质组范围内的疾病预测,与现有方法相比,其准确度更高或具有更快的速度。实验结果表明,我们提出的IDR预测器具有相同的水平或优于同类的其他最新方法,在可公开获得的蛋白质结构预测数据集的关键评估中达到0.76的准确度和0.85的AUC( CASP10)。因此,我们的方法适用于蛋白质组范围内的疾病预测,与现有方法相比,其准确度更高或具有更快的速度。
更新日期:2020-06-14
down
wechat
bug