当前位置: X-MOL 学术Algorithms Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DeepGRP: engineering a software tool for predicting genomic repetitive elements using Recurrent Neural Networks with attention
Algorithms for Molecular Biology ( IF 1 ) Pub Date : 2021-08-23 , DOI: 10.1186/s13015-021-00199-0
Fabian Hausmann 1 , Stefan Kurtz 2
Affiliation  

Repetitive elements contribute a large part of eukaryotic genomes. For example, about 40 to 50% of human, mouse and rat genomes are repetitive. So identifying and classifying repeats is an important step in genome annotation. This annotation step is traditionally performed using alignment based methods, either in a de novo approach or by aligning the genome sequence to a species specific set of repetitive sequences. Recently, Li (Bioinformatics 35:4408–4410, 2019) developed a novel software tool dna-brnn to annotate repetitive sequences using a recurrent neural network trained on sample annotations of repetitive elements. We have developed the methods of dna-brnn further and engineered a new software tool DeepGRP. This combines the basic concepts of Li (Bioinformatics 35:4408–4410, 2019) with current techniques developed for neural machine translation, the attention mechanism, for the task of nucleotide-level annotation of repetitive elements. An evaluation on the human genome shows a 20% improvement of the Matthews correlation coefficient for the predictions delivered by DeepGRP, when compared to dna-brnn. DeepGRP predicts two additional classes of repeats (compared to dna-brnn) and is able to transfer repeat annotations, using RepeatMasker-based training data to a different species (mouse). Additionally, we could show that DeepGRP predicts repeats annotated in the Dfam database, but not annotated by RepeatMasker. DeepGRP is highly scalable due to its implementation in the TensorFlow framework. For example, the GPU-accelerated version of DeepGRP is approx. 1.8 times faster than dna-brnn, approx. 8.6 times faster than RepeatMasker and over 100 times faster than HMMER searching for models of the Dfam database. By incorporating methods from neural machine translation, DeepGRP achieves a consistent improvement of the quality of the predictions compared to dna-brnn. Improved running times are obtained by employing TensorFlow as implementation framework and the use of GPUs. By incorporating two additional classes of repeats, DeepGRP provides more complete annotations, which were evaluated against three state-of-the-art tools for repeat annotation.

中文翻译:

DeepGRP:设计一个软件工具,使用循环神经网络注意预测基因组重复元素

重复元素占真核基因组的很大一部分。例如,大约 40% 到 50% 的人类、小鼠和大鼠基因组是重复的。因此识别和分类重复序列是基因组注释的重要步骤。该注释步骤传统上使用基于比对的方法进行,无论是在从头方法中,还是通过将基因组序列与特定物种的重复序列集进行比对。最近,Li (Bioinformatics 35:4408–4410, 2019) 开发了一种新的软件工具 dna-brnn,使用在重复元素的样本注释上训练的循环神经网络来注释重复序列。我们进一步开发了 dna-brnn 的方法,并设计了一个新的软件工具 DeepGRP。这结合了 Li (Bioinformatics 35:4408–4410, 2019) 使用当前为神经机器翻译开发的技术,注意力机制,用于重复元素的核苷酸级注释任务。对人类基因组的评估显示,与 dna-brnn 相比,DeepGRP 提供的预测的 Matthews 相关系数提高了 20%。DeepGRP 预测另外两类重复(与 dna-brnn 相比),并且能够使用基于 RepeatMasker 的训练数据将重复注释转移到不同的物种(小鼠)。此外,我们可以证明 DeepGRP 可以预测 Dfam 数据库中注释的重复,但未由 RepeatMasker 注释。由于 DeepGRP 在 TensorFlow 框架中实现,因此具有高度可扩展性。例如,DeepGRP 的 GPU 加速版本大约是。比 dna-brnn 快 1.8 倍,约。8. 比RepeatMasker 快6 倍,比HMMER 搜索Dfam 数据库模型快100 多倍。通过结合神经机器翻译的方法,与 dna-brnn 相比,DeepGRP 实现了预测质量的持续改进。通过使用 TensorFlow 作为实现框架和使用 GPU 来获得改进的运行时间。通过合并另外两类重复,DeepGRP 提供了更完整的注释,这些注释是针对三种最先进的重复注释工具进行评估的。通过使用 TensorFlow 作为实现框架和使用 GPU 来获得改进的运行时间。通过合并另外两类重复,DeepGRP 提供了更完整的注释,这些注释是针对三种最先进的重复注释工具进行评估的。通过使用 TensorFlow 作为实现框架并使用 GPU 来获得改进的运行时间。通过合并另外两类重复,DeepGRP 提供了更完整的注释,这些注释是针对三种最先进的重复注释工具进行评估的。
更新日期:2021-08-23
down
wechat
bug