当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fast and accurate microRNA search using CNN.
BMC Bioinformatics ( IF 3 ) Pub Date : 2019-12-27 , DOI: 10.1186/s12859-019-3279-2
Xubo Tang 1 , Yanni Sun 1
Affiliation  

BACKGROUND There are many different types of microRNAs (miRNAs) and elucidating their functions is still under intensive research. A fundamental step in functional annotation of a new miRNA is to classify it into characterized miRNA families, such as those in Rfam and miRBase. With the accumulation of annotated miRNAs, it becomes possible to use deep learning-based models to classify different types of miRNAs. In this work, we investigate several key issues associated with successful application of deep learning models for miRNA classification. First, as secondary structure conservation is a prominent feature for noncoding RNAs including miRNAs, we examine whether secondary structure-based encoding improves classification accuracy. Second, as there are many more non-miRNA sequences than miRNAs, instead of assigning a negative class for all non-miRNA sequences, we test whether using softmax output can distinguish in-distribution and out-of-distribution samples. Finally, we investigate whether deep learning models can correctly classify sequences from small miRNA families. RESULTS We present our trained convolutional neural network (CNN) models for classifying miRNAs using different types of feature learning and encoding methods. In the first method, we explicitly encode the predicted secondary structure in a matrix. In the second method, we use only the primary sequence information and one-hot encoding matrix. In addition, in order to reject sequences that should not be classified into targeted miRNA families, we use a threshold derived from softmax layer to exclude out-of-distribution sequences, which is an important feature to make this model useful for real transcriptomic data. The comparison with the state-of-the-art ncRNA classification tools such as Infernal shows that our method can achieve comparable sensitivity and accuracy while being significantly faster. CONCLUSION Automatic feature learning in CNN can lead to better classification accuracy and sensitivity for miRNA classification and annotation. The trained models and also associated codes are freely available at https://github.com/HubertTang/DeepMir.

中文翻译:

使用CNN进行快速准确的microRNA搜索。

背景技术存在许多不同类型的微RNA(miRNA),并且阐明其功能仍在深入研究中。对新miRNA进行功能注释的基本步骤是将其分类为特征化的miRNA家族,例如Rfam和miRBase中的家族。随着带注释的miRNA的积累,可以使用基于深度学习的模型对不同类型的miRNA进行分类。在这项工作中,我们调查了与深度学习模型成功应用于miRNA分类相关的几个关键问题。首先,由于二级结构保守性是包括miRNA在内的非编码RNA的显着特征,因此我们研究了基于二级结构的编码是否可以提高分类准确性。其次,由于非miRNA序列比miRNA多得多,我们测试使用softmax输出是否可以区分分布内和分布外样本,而不是为所有非miRNA序列分配否定类别。最后,我们研究了深度学习模型是否可以正确分类来自小型miRNA家族的序列。结果我们介绍了我们训练有素的卷积神经网络(CNN)模型,用于使用不同类型的特征学习和编码方法对miRNA进行分类。在第一种方法中,我们将预测的二级结构显式编码在矩阵中。在第二种方法中,我们仅使用主序列信息和单编码矩阵。此外,为了拒绝不应分类为目标miRNA家族的序列,我们使用源自softmax层的阈值来排除分布失序的序列,这是使该模型对实际转录组数据有用的重要功能。与最新的ncRNA分类工具(例如Infernal)进行的比较表明,我们的方法可以实现相当的灵敏度和准确性,同时速度明显更快。结论CNN中的自动特征学习可以提高miRNA分类和注释的分类准确度和敏感性。可以在https://github.com/HubertTang/DeepMir免费获得训练有素的模型以及相关代码。结论CNN中的自动特征学习可以提高miRNA分类和注释的分类准确度和敏感性。可以在https://github.com/HubertTang/DeepMir免费获得训练有素的模型以及相关代码。结论CNN中的自动特征学习可以提高miRNA分类和注释的分类准确度和敏感性。可以在https://github.com/HubertTang/DeepMir免费获得训练有素的模型以及相关代码。
更新日期:2019-12-27
down
wechat
bug