当前位置: X-MOL 学术Mol. Omics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enhancer recognition and prediction during spermatogenesis based on deep convolutional neural networks.
Molecular Omics ( IF 2.9 ) Pub Date : 2020-05-29 , DOI: 10.1039/d0mo00031k
Chengzhang Sun 1 , Ning Zhang , Peng Yu , Xiaolong Wu , Qun Li , Tongtong Li , Hao Li , Xia Xiao , Abdullah Shalmani , Leijie Li , Dongxue Che , Xiaodan Wang , Peng Zhang , Ziyu Chen , Tong Liu , Jianbang Zhao , Jinlian Hua , Mingzhi Liao
Affiliation  

Motivation: enhancers play an important role in the regulation of gene expression during spermatogenesis. The development of ChIP-Chip and ChIP-Seq sequencing technology has enabled researchers to focus on the relationship between enhancers and DNA sequences and histone protein modifications. However, the prediction of enhancers based on the locally conserved DNA sequence and similar histone modification features is still unknown. Here, the present study proposed a convolutional neural network (CNN) model to predict enhancers that can regulate gene expression during spermatogenesis. Results: we have obtained a positive set of enhancers using the P300 locus, verified by experiments, while a negative set was constructed using the promoter as a non-enhancer locus. The model was trained on all types of specific cells during spermatogenesis independently, and the transfer learning strategy was used to fine-tune the model based on which the model can be trained and adapted to other cells quickly. We visualized the convolution layer of the trained model and aligned the predicted enhancer with the JASPAR database. The results showed that the model was highly matched with some important transcription factors during spermatogenesis, signifying the reliability of the model. Finally, we compared the CNN algorithm with the gkmSVM algorithm (Support Vector Machine). It is well known that CNN has better performance than the gkmSVM algorithm, especially in the generalization ability. Our work demonstrated their strong learning ability and the low CPU requirements for the experiment, with a small number of convolution layers and simple network structure, while avoiding overfitting the training data. At the end of the experiment, we used the trained model to build an enhancer recognition website for further research and communication.

中文翻译:

基于深度卷积神经网络的生精过程中增强子的识别和预测。

动机:增强子在精子发生过程中对基因表达的调节中起着重要作用。ChIP-Chip和ChIP-Seq测序技术的发展使研究人员能够专注于增强子与DNA序列和组蛋白修饰之间的关系。然而,基于局部保守的DNA序列和相似的组蛋白修饰特征对增强子的预测仍然未知。在这里,本研究提出了卷积神经网络(CNN)模型,以预测在精子发生过程中可以调节基因表达的增强子。结果:我们通过P300基因座获得了一组阳性增强子,经实验验证,而使用启动子作为非增强子基因构建了一组阴性增强子。该模型在精子发生过程中独立地针对所有类型的特定细胞进行训练,并且使用转移学习策略对模型进行微调,基于该模型可以对模型进行训练并使其快速适应其他细胞。我们将训练模型的卷积层可视化,并将预测的增强子与JASPAR数据库对齐。结果表明该模型与精子发生过程中的一些重要转录因子高度匹配,表明该模型的可靠性。最后,我们将CNN算法与gkmSVM算法(支持向量机)进行了比较。众所周知,CNN比gkmSVM算法具有更好的性能,尤其是在泛化能力方面。我们的工作证明了他们强大的学习能力和较低的CPU要求,具有少量的卷积层和简单的网络结构,同时避免过拟合训练数据。在实验结束时,我们使用了经过训练的模型来构建增强子识别网站,以进行进一步的研究和交流。
更新日期:2020-05-29
down
wechat
bug