当前位置: X-MOL 学术IEEE Trans. NanoBiosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Enhanced Protein Fold Recognition for Low Similarity Datasets using Convolutional and Skip-Gram Features with Deep Neural Network.
IEEE Transactions on NanoBioscience ( IF 3.7 ) Pub Date : 2020-09-07 , DOI: 10.1109/tnb.2020.3022456
Sanjay Bankapur , Nagamma Patil

The protein fold recognition is one of the important tasks of structural biology, which helps in addressing further challenges like predicting the protein tertiary structures and its functions. Many machine learning works are published to identify the protein folds effectively. However, very few works have reported the fold recognition accuracy above 80% on benchmark datasets. In this study, an effective set of global and local features are extracted from the proposed Convolutional (Conv) and SkipXGram bi-gram (SXGbg) techniques, and the fold recognition is performed using the proposed deep neural network. The performance of the proposed model reported 91.4% fold accuracy on one of the derived low similarity (< 25%) datasets of latest extended version of SCOPe_2.07. The proposed model is further evaluated on three popular and publicly available benchmark datasets such as DD, EDD, and TG and obtained 85.9%, 95.8%, and 88.8% fold accuracies, respectively. This work is first to report fold recognition accuracy above 85% on all the benchmark datasets. The performance of the proposed model has outperformed the best state-of-the-art models by 5% to 23% on DD, 2% to 19% on EDD, and 3% to 30% on TG dataset.

中文翻译:

使用深度神经网络的卷积和 Skip-Gram 特征对低相似性数据集进行增强的蛋白质折叠识别。

蛋白质折叠识别是结构生物学的重要任务之一,它有助于解决进一步的挑战,如预测蛋白质三级结构及其功能。许多机器学习工作已发表以有效识别蛋白质折叠。然而,很少有工作报告在基准数据集上折叠识别准确率超过 80%。在这项研究中,从所提出的卷积 (Conv) 和 SkipXGram 双元组 (SXGbg) 技术中提取了一组有效的全局和局部特征,并使用所提出的深度神经网络执行折叠识别。所提出模型的性能在 SCOPe_2.07 的最新扩展版本的派生低相似性 (< 25%) 数据集之一上报告了 91.4% 的折叠精度。所提出的模型在三个流行和公开可用的基准数据集(如 DD、EDD 和 TG)上进行了进一步评估,并分别获得了 85.9%、95.8% 和 88.8% 的折叠精度。这项工作首次在所有基准数据集上报告了 85% 以上的折叠识别准确率。所提出模型的性能在 DD 上超过了最佳的最先进模型 5% 到 23%,在 EDD 上超过了 2% 到 19%,在 TG 数据集上超过了 3% 到 30%。
更新日期:2020-09-07
down
wechat
bug