当前位置: X-MOL 学术Int. J. Mach. Learn. & Cyber. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DeepSite: bidirectional LSTM and CNN models for predicting DNA–protein binding
International Journal of Machine Learning and Cybernetics ( IF 5.6 ) Pub Date : 2019-07-29 , DOI: 10.1007/s13042-019-00990-x
Yongqing Zhang , Shaojie Qiao , Shengjie Ji , Yizhou Li

Transcription factors are cis-regulatory molecules that bind to specific sub-regions of DNA promoters and initiate transcription, the process that regulates the conversion of genetic information from DNA to RNA. Several computational methods have been developed to predict DNA–protein binding sites in DNA sequence using convolutional neural network (CNN). However, these techniques could indicate the dependency information of DNA sequence information in the framework of CNN. In addition, these methods are not accurate enough in prediction of the DNA–protein binding sites from the DNA sequence. In this study, we employ the bidirectional long short-term memory (BLSTM) and CNN to capture long-term dependencies between the sequence motifs in DNA, which is called DeepSite. Apart from traditional CNN, which includes six layers: input layer, BLSTM layer, CNN layer, pooling layer, full connection layer and output layer, DeepSite approach can predict DNA–protein binding sites with 87.12% sensitivity, 91.06% specificity, 89.19% accuracy and 0.783 MCC, when tested on the 690 Chip-seq experiments from ENCODE. Lastly, we conclude that our proposed method can also be applied to find DNA–protein binding sites in different DNA sequences.

中文翻译:

DeepSite:双向LSTM和CNN模型,用于预测DNA与蛋白质的结合

转录因子是顺式-与DNA启动子的特定子区域结合并启动转录的调节分子,该过程调节遗传信息从DNA到RNA的转化。已经开发出几种计算方法,以使用卷积神经网络(CNN)预测DNA序列中的DNA-蛋白质结合位点。但是,这些技术可以在CNN框架中指示DNA序列信息的依赖性信息。此外,这些方法在根据DNA序列预测DNA-蛋白质结合位点方面不够准确。在这项研究中,我们采用双向长期短期记忆(BLSTM)和CNN捕获DNA序列基序之间的长期依赖性,这称为DeepSite。除了传统的CNN,它包括六层:输入层,BLSTM层,CNN层,池化层,MCC在ENCODE的690 Chip-seq实验上进行测试时。最后,我们得出结论,我们提出的方法也可以用于发现不同DNA序列中的DNA-蛋白质结合位点。
更新日期:2019-07-29
down
wechat
bug