当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep4mC: systematic assessment and computational prediction for DNA N4-methylcytosine sites by deep learning.
Briefings in Bioinformatics ( IF 6.8 ) Pub Date : 2020-06-24 , DOI: 10.1093/bib/bbaa099
Haodong Xu 1 , Peilin Jia 1 , Zhongming Zhao 1
Affiliation  

DNA N4-methylcytosine (4mC) modification represents a novel epigenetic regulation. It involves in various cellular processes, including DNA replication, cell cycle and gene expression, among others. In addition to experimental identification of 4mC sites, in silico prediction of 4mC sites in the genome has emerged as an alternative and promising approach. In this study, we first reviewed the current progress in the computational prediction of 4mC sites and systematically evaluated the predictive capacity of eight conventional machine learning algorithms as well as 12 feature types commonly used in previous studies in six species. Using a representative benchmark dataset, we investigated the contribution of feature selection and stacking approach to the model construction, and found that feature optimization and proper reinforcement learning could improve the performance. We next recollected newly added 4mC sites in the six species’ genomes and developed a novel deep learning-based 4mC site predictor, namely Deep4mC. Deep4mC applies convolutional neural networks with four representative features. For species with small numbers of samples, we extended our deep learning framework with a bootstrapping method. Our evaluation indicated that Deep4mC could obtain high accuracy and robust performance with the average area under curve (AUC) values greater than 0.9 in all species (range: 0.9005–0.9722). In comparison, Deep4mC achieved an AUC value improvement from 10.14 to 46.21% when compared to previous tools in these six species. A user-friendly web server (https://bioinfo.uth.edu/Deep4mC) was built for predicting putative 4mC sites in a genome.

中文翻译:

Deep4mC:通过深度学习对 DNA N4-甲基胞嘧啶位点进行系统评估和计算预测。

DNA N4-甲基胞嘧啶 (4mC) 修饰代表了一种新的表观遗传调控。它涉及各种细胞过程,包括 DNA 复制、细胞周期和基因表达等。除了 4mC 位点的实验鉴定外,在计算机基因组中 4mC 位点的预测已成为一种替代且有前景的方法。在这项研究中,我们首先回顾了 4mC 位点计算预测的当前进展,并系统地评估了 8 种传统机器学习算法的预测能力以及先前研究中常用的 12 种特征类型对 6 个物种的预测能力。使用具有代表性的基准数据集,我们研究了特征选择和堆叠方法对模型构建的贡献,发现特征优化和适当的强化学习可以提高性能。接下来,我们回忆了六个物种基因组中新增的 4mC 位点,并开发了一种基于深度学习的新型 4mC 位点预测器,即 Deep4mC。Deep4mC 应用具有四个代表性特征的卷积神经网络。对于样本数量较少的物种,我们使用引导方法扩展了我们的深度学习框架。我们的评估表明,Deep4mC 可以获得高精度和稳健的性能,所有物种的平均曲线下面积 (AUC) 值都大于 0.9(范围:0.9005-0.9722)。相比之下,Deep4mC 在这六个物种中与之前的工具相比,AUC 值从 10.14% 提高到 46.21%。一个用户友好的网络服务器 (https://bioinfo.uth.edu/Deep4mC) 用于预测基因组中假定的 4mC 位点。与之前的工具相比,Deep4mC 在这六个物种中实现了 AUC 值从 10.14% 提高到 46.21%。一个用户友好的网络服务器 (https://bioinfo.uth.edu/Deep4mC) 用于预测基因组中假定的 4mC 位点。与之前的工具相比,Deep4mC 在这六个物种中实现了 AUC 值从 10.14% 提高到 46.21%。一个用户友好的网络服务器 (https://bioinfo.uth.edu/Deep4mC) 用于预测基因组中假定的 4mC 位点。
更新日期:2020-06-27
down
wechat
bug