当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
TERL: classification of transposable elements by convolutional neural networks.
Briefings in Bioinformatics ( IF 9.5 ) Pub Date : 2020-09-08 , DOI: 10.1093/bib/bbaa185
Murilo Horacio Pereira da Cruz 1, 2 , Douglas Silva Domingues 3, 4, 5 , Priscila Tiemi Maeda Saito 6, 7, 8, 9 , Alexandre Rossi Paschoal 1, 10 , Pedro Henrique Bugatti 6, 7, 9
Affiliation  

Transposable elements (TEs) are the most represented sequences occurring in eukaryotic genomes. Few methods provide the classification of these sequences into deeper levels, such as superfamily level, which could provide useful and detailed information about these sequences. Most methods that classify TE sequences use handcrafted features such as k-mers and homology-based search, which could be inefficient for classifying non-homologous sequences. Here we propose an approach, called transposable elements pepresentation learner (TERL), that preprocesses and transforms one-dimensional sequences into two-dimensional space data (i.e., image-like data of the sequences) and apply it to deep convolutional neural networks. This classification method tries to learn the best representation of the input data to classify it correctly. We have conducted six experiments to test the performance of TERL against other methods. Our approach obtained macro mean accuracies and F1-score of 96.4% and 85.8% for superfamilies and 95.7% and 91.5% for the order sequences from RepBase, respectively. We have also obtained macro mean accuracies and F1-score of 95.0% and 70.6% for sequences from seven databases into superfamily level and 89.3% and 73.9% for the order level, respectively. We surpassed accuracy, recall and specificity obtained by other methods on the experiment with the classification of order level sequences from seven databases and surpassed by far the time elapsed of any other method for all experiments. Therefore, TERL can learn how to predict any hierarchical level of the TEs classification system and is about 20 times and three orders of magnitude faster than TEclass and PASTEC, respectively https://github.com/muriloHoracio/TERL. Contact:murilocruz@alunos.utfpr.edu.br

中文翻译:

TERL:通过卷积神经网络对转座元素进行分类。

转座因子 (TE) 是真核基因组中最有代表性的序列。很少有方法将这些序列分类到更深的层次,例如超家族层次,这可以提供有关这些序列的有用和详细的信息。大多数对 TE 序列进行分类的方法都使用手工制作的特征,例如 k-mers 和基于同源性的搜索,这对于对非同源序列进行分类可能效率低下。在这里,我们提出了一种称为转置元素表示学习器 (TERL) 的方法,该方法将一维序列预处理并将其转换为二维空间数据(即序列的类似图像的数据),并将其应用于深度卷积神经网络。这种分类方法试图学习输入数据的最佳表示以对其进行正确分类。我们进行了六个实验来测试 TERL 与其他方法的性能。我们的方法获得的宏观平均准确率和 F1 分数分别为 96.4% 和 85.8% 的超家族和 95.7% 和 91.5% 的 RepBase 订单序列。我们还获得了来自七个数据库到超家族级别的序列的宏观平均准确率和 F1 分数分别为 95.0% 和 70.6%,以及顺序级别分别为 89.3% 和 73.9%。我们在对来自七个数据库的订单级别序列进行分类的实验中超过了其他方法获得的准确率、召回率和特异性,并且在所有实验中超过了任何其他方法所花费的时间。所以,联系方式: murilocruz@alunos.utfpr.edu.br
更新日期:2020-09-10
down
wechat
bug