当前位置: X-MOL 学术J. Bioinform. Comput. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
EDeepSSP: Explainable deep neural networks for exact splice sites prediction
Journal of Bioinformatics and Computational Biology ( IF 0.9 ) Pub Date : 2020-05-21 , DOI: 10.1142/s0219720020500249
Santhosh Amilpur 1 , Raju Bhukya 1
Affiliation  

Splice site prediction is crucial for understanding underlying gene regulation, gene function for better genome annotation. Many computational methods exist for recognizing the splice sites. Although most of the methods achieve a competent performance, their interpretability remains challenging. Moreover, all traditional machine learning methods manually extract features, which is tedious job. To address these challenges, we propose a deep learning-based approach (EDeepSSP) that employs convolutional neural networks (CNNs) architecture for automatic feature extraction and effectively predicts splice sites. Our model, EDeepSSP, divulges the opaque nature of CNN by extracting significant motifs and explains why these motifs are vital for predicting splice sites. In this study, experiments have been conducted on six benchmark acceptors and donor datasets of humans, cress, and fly. The results show that EDeepSSP has outperformed many state-of-the-art approaches. EDeepSSP achieves the highest area under the receiver operating characteristic curve (AUC_ROC) and area under the precision-recall curve (AUC_PR) of 99.32% and 99.26% on human donor datasets, respectively. We also analyze various filter activities, feature activations, and extracted significant motifs responsible for the splice site prediction. Further, we validate the learned motifs of our model against known motifs of JASPAR splice site database.

中文翻译:

EDeepSSP:用于精确剪接位点预测的可解释深度神经网络

剪接位点预测对于理解潜在的基因调控、基因功能以更好地进行基因组注释至关重要。存在许多用于识别剪接位点的计算方法。尽管大多数方法都取得了令人满意的性能,但它们的可解释性仍然具有挑战性。而且,所有传统的机器学习方法都是手动提取特征,这是一项繁琐的工作。为了应对这些挑战,我们提出了一种基于深度学习的方法 (EDeepSSP),该方法采用卷积神经网络 (CNN) 架构进行自动特征提取并有效地预测拼接位点。我们的模型 EDeepSSP 通过提取重要的基序揭示了 CNN 的不透明性质,并解释了为什么这些基序对于预测剪接位点至关重要。在这项研究中,已经在人类、水芹和苍蝇的六个基准受体和供体数据集上进行了实验。结果表明,EDeepSSP 的性能优于许多最先进的方法。EDeepSSP 在人类供体数据集上分别达到 99.32% 和 99.26% 的接受者操作特征曲线下面积 (AUC_ROC) 和精确召回曲线下面积 (AUC_PR) 的最高值。我们还分析了各种过滤器活动、特征激活,并提取了负责剪接位点预测的重要基序。此外,我们针对 JASPAR 拼接站点数据库的已知基序验证了我们模型的学习基序。EDeepSSP 在人类供体数据集上分别实现了 99.32% 和 99.26% 的接受者操作特征曲线下面积 (AUC_ROC) 和精确召回曲线下面积 (AUC_PR) 的最高值。我们还分析了各种过滤器活动、特征激活,并提取了负责剪接位点预测的重要基序。此外,我们针对 JASPAR 拼接站点数据库的已知基序验证了我们模型的学习基序。EDeepSSP 在人类供体数据集上分别实现了 99.32% 和 99.26% 的接受者操作特征曲线下面积 (AUC_ROC) 和精确召回曲线下面积 (AUC_PR) 的最高值。我们还分析了各种过滤器活动、特征激活,并提取了负责剪接位点预测的重要基序。此外,我们针对 JASPAR 拼接站点数据库的已知基序验证了我们模型的学习基序。
更新日期:2020-05-21
down
wechat
bug