当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Weakly-Supervised Convolutional Neural Network Architecture for Predicting Protein-DNA Binding.
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 3.6 ) Pub Date : 2018-08-07 , DOI: 10.1109/tcbb.2018.2864203
Qinhu Zhang , Lin Zhu , Wenzheng Bao , De-shuang Huang

Although convolutional neural networks (CNN) have outperformed conventional methods in predicting the sequence specificities of protein-DNA binding in recent years, they do not take full advantage of the intrinsic weakly-supervised information of DNA sequences that a bound sequence may contain multiple TFBS(s). Here, we propose a weakly-supervised convolutional neural network architecture (WSCNN), combining multiple-instance learning (MIL) with CNN, to further boost the performance of predicting protein-DNA binding. WSCNN first divides each DNA sequence into multiple overlapping subsequences (instances) with a sliding window, and then separately models each instance using CNN, and finally fuses the predicted scores of all instances in the same bag using four fusion methods, including Max, Average, Linear Regression, and Top-Bottom Instances. The experimental results on in vivo and in vitro datasets illustrate the performance of the proposed approach. Moreover, models built on in vitro data using WSCNN can predict in vivo protein-DNA binding with good accuracy. In addition, we give a quantitative analysis of the importance of the reverse-complement mode in predicting in vivo protein-DNA binding, and explain why not directly use advanced pooling layers to combine MIL with CNN, through a series of experiments.

中文翻译:

弱监督的卷积神经网络体系结构,用于预测蛋白质与DNA的结合。

尽管近年来卷积神经网络(CNN)在预测蛋白质与DNA结合的序列特异性方面优于传统方法,但它们并未充分利用DNA序列固有的弱监督信息,即结合的序列可能包含多个TFBS( s)。在这里,我们提出了一种弱监督的卷积神经网络体系结构(WSCNN),将多实例学习(MIL)与CNN相结合,以进一步提高预测蛋白质与DNA结合的性能。WSCNN首先使用滑动窗口将每个DNA序列划分为多个重叠的子序列(实例),然后使用CNN分别对每个实例建模,最后使用四种融合方法(包括最大,平均,线性回归和上下实例。在体内和体外数据集上的实验结果说明了该方法的性能。此外,使用WSCNN建立在体外数据上的模型可以预测体内蛋白质与DNA的结合,准确性很高。此外,我们通过一系列实验对反向补体模式在预测体内蛋白质与DNA结合中的重要性进行了定量分析,并解释了为什么不直接使用高级合并层将MIL与CNN结合的原因。
更新日期:2020-04-22
down
wechat
bug