当前位置: X-MOL 学术Curr. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ESDA: An Improved Approach to Accurately Identify Human snoRNAs for Precision Cancer Therapy
Current Bioinformatics ( IF 2.4 ) Pub Date : 2019-12-31 , DOI: 10.2174/1574893614666190424162230
Yan-mei Dong 1 , Jia-hao Bi 1 , Qi-en He 1 , Kai Song 1
Affiliation  

Background: SnoRNAs (Small nucleolar RNAs) are small RNA molecules with approximately 60-300 nucleotides in sequence length. They have been proved to play important roles in cancer occurrence and progression. It is of great clinical importance to identify new snoRNAs as fast and accurately as possible.

Objective: A novel algorithm, ESDA (Elastically Sparse Partial Least Squares Discriminant Analysis), was proposed to improve the speed and the performance of recognizing snoRNAs from other RNAs in human genomes.

Methods: In ESDA algorithm, to optimize the extracted information, kernel features were selected from the variables extracted from both primary sequences and secondary structures. Then they were used by SPLSDA (sparse partial least squares discriminant analysis) algorithm as input variables for the final classification model training to distinguish snoRNA sequences from other Human RNAs. Due to the fact that no prior biological knowledge is request to optimize the classification model, ESDA is a very practical method especially for completely new sequences.

Results: 89 H/ACA snoRNAs and 269 C/D snoRNAs of human were used as positive samples and 3403 non-snoRNAs as negative samples to test the identification performance of the proposed ESDA. For the H/ACA snoRNAs identification, the sensitivity and specificity were respectively as high as 99.6% and 98.8%. For C/D snoRNAs, they were respectively 96.1% and 98.3%. Furthermore, we compared ESDA with other widely used algorithms and classifiers: SnoReport, RF (Random Forest), DWD (Distance Weighted Discrimination) and SVM (Support Vector Machine). The highest improvement of accuracy obtained by ESDA was 25.1%.

Conclusion: Strongly proved the superiority performance of ESDA and make it promising for identifying SnoRNAs for further development of the precision medicine for cancers.



中文翻译:

ESDA:一种精确鉴定人类snoRNA的精确癌症治疗方法

背景:SnoRNA(小核仁RNA)是小RNA分子,序列长度约为60-300个核苷酸。已证明它们在癌症的发生和发展中起重要作用。尽快准确地鉴定新的snoRNA具有非常重要的临床意义。

目的:提出了一种新的算法ESDA(弹性稀疏偏最小二乘判别分析),以提高识别人类基因组中其他RNA的snoRNA的速度和性能。

方法:在ESDA算法中,为了优化提取的信息,从从一级序列和二级结构中提取的变量中选择了内核特征。然后将它们用于SPLSDA(稀疏偏最小二乘判别分析)算法,作为最终分类模型训练的输入变量,以区分snoRNA序列与其他人RNA。由于不需要任何先验生物学知识来优化分类模型,因此ESDA是一种非常实用的方法,尤其是对于全新序列而言。

结果:以人的89 H / ACA snoRNA和269 C / D snoRNAs为阳性样品,以3403 non-snoRNAs为阴性样品来测试拟议的ESDA的鉴定性能。H / ACA snoRNAs的识别灵敏度和特异性分别高达99.6%和98.8%。对于C / D snoRNA,它们分别为96.1%和98.3%。此外,我们将ESDA与其他广泛使用的算法和分类器进行了比较:SnoReport,RF(随机森林),DWD(距离加权判别)和SVM(支持向量机)。ESDA获得的精度最高提高了25.1%。

结论:有力地证明了ESDA的优越性能,并有望用于鉴定SnoRNAs来进一步开发癌症精密药物。

更新日期:2019-12-31
down
wechat
bug