当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
G4detector: Convolutional Neural Network to Predict DNA G-Quadruplexes
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 4.5 ) Pub Date : 2021-04-19 , DOI: 10.1109/tcbb.2021.3073595
Mira Barshai 1 , Alice Aubert 2 , Yaron Orenstein 1
Affiliation  

G-quadruplexes (G4s) are nucleic acid secondary structures that form within guanine-rich DNA or RNA sequences. G4 formation can affect chromatin architecture and gene regulation, and has been associated with genomic instability, genetic diseases, and cancer progression. The experimental data produced by the G4-seq experiment provides unprecedented details on G4 formation in the genome. Still, running the experimental protocol on a whole genome is an expensive and time-consuming process. Thus, it is highly desirable to have a computational method to predict G4 formation in new DNA sequences or whole genomes. Here, we present G4detector, a new method based on a convolutional neural network to predict G4s from DNA sequences. On top of the sequence information, we improved prediction accuracy by the addition of RNA secondary structure information. To train and test G4detector, we compiled novel high-throughput benchmarks over multiple species genomes measured by the G4-seq protocol. We show that G4detector outperforms extant methods for the same task on all benchmark datasets, can detect G4s genome-wide with high accuracy, and is able to extrapolate human-trained measurements to various non-human species. The code and benchmarks are publicly available on github.com/OrensteinLab/G4detector.

中文翻译:

G4detector:预测 DNA G-四链体的卷积神经网络

G-四链体 (G4) 是在富含鸟嘌呤的 DNA 或 RNA 序列中形成的核酸二级结构。G4 形成可以影响染色质结构和基因调控,并且与基因组不稳定性、遗传疾病和癌症进展有关。G4-seq 实验产生的实验数据为基因组中 G4 的形成提供了前所未有的细节。尽管如此,在整个基因组上运行实验方案是一个昂贵且耗时的过程。因此,非常需要一种计算方法来预测新 DNA 序列或全基因组中 G4 的形成。在这里,我们介绍了 G4detector,这是一种基于卷积神经网络从 DNA 序列中预测 G4 的新方法。在序列信息之上,我们通过添加 RNA 二级结构信息提高了预测准确性。为了训练和测试 G4detector,我们在通过 G4-seq 协议测量的多个物种基因组上编译了新的高通量基准。我们表明,G4detector 在所有基准数据集上都优于现有的相同任务方法,可以高精度检测全基因组 G4,并且能够将人类训练的测量结果外推到各种非人类物种。代码和基准测试在github.com/OrensteinLab/G4detector.
更新日期:2021-04-19
down
wechat
bug