当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Identify ncRNA Subcellular Localization via Graph Regularized $k$k-Local Hyperplane Distance Nearest Neighbor Model on Multi-Kernel Learning
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 4.5 ) Pub Date : 2021-08-25 , DOI: 10.1109/tcbb.2021.3107621
Haohao Zhou , Hao Wang , Jijun Tang , Yijie Ding , Fei Guo

Non-coding RNAs (ncRNAs) are a type of RNAs which are not used to encode protein sequences. Emerging evidence shows that lots of ncRNAs may participate in many biological processes and must be widely involved in many types of cancers. Therefore, understanding their functionality is of great importance. Similar to proteins, various functions of ncRNAs relies on their subcellular localizations. Traditional high-throughput methods in wet-lab to identify subcellular localization is time-consuming and costly. In this paper, we propose a novel computational method based on multi-kernel learning to identify multi-label ncRNA subcellular localizations, via graph regularized $k$ -local hyperplane distance nearest neighbor algorithm. First, we construct six types of sequence-based feature descriptors and select important feature vectors. Then, we build a multi-kernel learning model with Hilbert-Schmidt independence criterion (HSIC) to obtain optimal weights for vairous features. Furthermore, we propose the graph regularized $k$ -local hyperplane distance nearest neighbor algorithm (GHKNN) as a binary classification model for detecting one kind of non-coding RNA subcellular localization. Finally, we apply One-vs-Rest strategy to decompose multi-label problem of non-coding RNA subcellular localizations. Our method achieves excellent performance on three ncRNA datasets and three human ncRNA datasets, and out-performs other outstanding machine learning methods. Comparing to existing method, our model also performs well especially on small datasets. We expect that this model will be useful for the prediction of subcellular localization and the study of important functional mechanisms of ncRNAs. Furthermore, we establish user-friendly web server ( http://ncrna.lbci.net/ ) with the implementation of our method, which can be easily used by most experimental scientists.

中文翻译:

通过图正则化$k$k-局部超平面距离最近邻模型在多核学习中识别 ncRNA 亚细胞定位

非编码 RNA (ncRNA) 是一种不用于编码蛋白质序列的 RNA。新出现的证据表明,许多 ncRNA 可能参与许多生物过程,并且必须广泛参与多种类型的癌症。因此,了解它们的功能非常重要。与蛋白质类似,ncRNA 的各种功能依赖于它们的亚细胞定位。湿实验室中用于识别亚细胞定位的传统高通量方法既费时又昂贵。在本文中,我们提出了一种基于多核学习的新计算方法,通过图形正则化来识别多标签 ncRNA 亚细胞定位$k$ -局部超平面距离最近邻算法。首先,我们构建了六种基于序列的特征描述符并选择了重要的特征向量。然后,我们使用 Hilbert-Schmidt 独立准则 (HSIC) 构建多核学习模型以获得各种特征的最佳权重。此外,我们提出正则化图$k$ - 局部超平面距离最近邻算法 (GHKNN) 作为检测一种非编码 RNA 亚细胞定位的二元分类模型。最后,我们应用 One-vs-Rest 策略来分解非编码 RNA 亚细胞定位的多标签问题。我们的方法在三个 ncRNA 数据集和三个人类 ncRNA 数据集上取得了优异的性能,并且优于其他出色的机器学习方法。与现有方法相比,我们的模型也表现良好,尤其是在小型数据集上。我们期望该模型将有助于亚细胞定位的预测和ncRNA重要功能机制的研究。此外,我们建立了用户友好的网络服务器( http://ncrna.lbci.net/ ) 随着我们方法的实施,大多数实验科学家可以很容易地使用它。
更新日期:2021-08-25
down
wechat
bug