当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DeepLncLoc: a deep learning framework for long non-coding RNA subcellular localization prediction based on subsequence embedding
Briefings in Bioinformatics ( IF 9.5 ) Pub Date : 2021-08-17 , DOI: 10.1093/bib/bbab360
Min Zeng 1 , Yifan Wu 1 , Chengqian Lu 1 , Fuhao Zhang 1 , Fang-Xiang Wu 2 , Min Li 1
Affiliation  

Long non-coding RNAs (lncRNAs) are a class of RNA molecules with more than 200 nucleotides. A growing amount of evidence reveals that subcellular localization of lncRNAs can provide valuable insights into their biological functions. Existing computational methods for predicting lncRNA subcellular localization use k-mer features to encode lncRNA sequences. However, the sequence order information is lost by using only k-mer features. We proposed a deep learning framework, DeepLncLoc, to predict lncRNA subcellular localization. In DeepLncLoc, we introduced a new subsequence embedding method that keeps the order information of lncRNA sequences. The subsequence embedding method first divides a sequence into some consecutive subsequences and then extracts the patterns of each subsequence, last combines these patterns to obtain a complete representation of the lncRNA sequence. After that, a text convolutional neural network is employed to learn high-level features and perform the prediction task. Compared with traditional machine learning models, popular representation methods and existing predictors, DeepLncLoc achieved better performance, which shows that DeepLncLoc could effectively predict lncRNA subcellular localization. Our study not only presented a novel computational model for predicting lncRNA subcellular localization but also introduced a new subsequence embedding method which is expected to be applied in other sequence-based prediction tasks. The DeepLncLoc web server is freely accessible at http://bioinformatics.csu.edu.cn/DeepLncLoc/, and source code and datasets can be downloaded from https://github.com/CSUBioGroup/DeepLncLoc.

中文翻译:

DeepLncLoc:一种基于子序列嵌入的长非编码RNA亚细胞定位预测深度学习框架

长链非编码 RNA (lncRNA) 是一类含有 200 多个核苷酸的 RNA 分子。越来越多的证据表明,lncRNA 的亚细胞定位可以为其生物学功能提供有价值的见解。用于预测 lncRNA 亚细胞定位的现有计算方法使用 k-mer 特征来编码 lncRNA 序列。但是,仅使用 k-mer 特征会丢失序列顺序信息。我们提出了一个深度学习框架 DeepLncLoc 来预测 lncRNA 亚细胞定位。在 DeepLncLoc 中,我们引入了一种新的子序列嵌入方法,该方法保留了 lncRNA 序列的顺序信息。子序列嵌入方法首先将一个序列划分为一些连续的子序列,然后提取每个子序列的模式,最后结合这些模式以获得lncRNA序列的完整表示。之后,使用文本卷积神经网络来学习高级特征并执行预测任务。与传统的机器学习模型、流行的表示方法和现有的预测器相比,DeepLncLoc 取得了更好的性能,这表明 DeepLncLoc 可以有效地预测 lncRNA 亚细胞定位。我们的研究不仅提出了一种用于预测 lncRNA 亚细胞定位的新计算模型,而且还介绍了一种新的子序列嵌入方法,该方法有望应用于其他基于序列的预测任务。DeepLncLoc 网络服务器可在 http://bioinformatics.csu.edu.cn/DeepLncLoc/ 免费访问,源代码和数据集可从 https://github 下载。
更新日期:2021-08-17
down
wechat
bug