当前位置: X-MOL 学术Interdiscip. Sci. Comput. Life Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Classification of Breast Cancer and Breast Neoplasm Scenarios Based on Machine Learning and Sequence Features from lncRNAs–miRNAs-Diseases Associations
Interdisciplinary Sciences: Computational Life Sciences ( IF 3.9 ) Pub Date : 2021-06-21 , DOI: 10.1007/s12539-021-00451-6
Juan Gutiérrez-Cárdenas 1, 2 , Zenghui Wang 1
Affiliation  

The influence of non-coding RNAs, such as lncRNAs (long non-coding RNAs) and miRNAs (microRNAs), is undeniable in several diseases, for example, in the formation of neoplasms and cancer scenarios. However, there are challenges due to the scarcity of validated datasets and the imbalance in the data. We found that the research of associations between miRNAs-lncRNAs and diseases is limited or done separately. In addition, those investigations, which use Machine Learning models joined with genomic sequence features extracted from miRNAs and lncRNAs, are few compared with using some methods such as genomic expression or Deep Learning techniques. In this paper, we propose a structure of using supervised and unsupervised machine learning models with genomic sequence features, such as k-mers, sequence alignments, and energy folding values, to validate miRNAs and lncRNAs association with breast cancer and neoplasms scenarios. Using One-Class SVM for outlier detection and comparing two supervised models such as SVM and Random Forest, we manage to obtain accuracy results of 95.44% for the One-class model, with 88.79% and 99.65% for the SVM and Random Forest models, respectively. The results showed a promising path for the study of sequence features interactions joined with Machine Learning models comparable to those found in the existing literature.

Graphic Abstract



中文翻译:

基于lncRNAs-miRNAs-疾病关联的机器学习和序列特征的乳腺癌和乳腺肿瘤场景分类

非编码 RNA,如 lncRNA(长链非编码 RNA)和 miRNA(microRNA),在多种疾病中的影响是不可否认的,例如在肿瘤和癌症的形成中。然而,由于经过验证的数据集的稀缺和数据的不平衡,存在挑战。我们发现miRNAs-lncRNAs与疾病之间关联的研究是有限的或单独进行的。此外,那些使用机器学习模型结合从 miRNA 和 lncRNA 中提取的基因组序列特征的研究,与使用基因组表达或深度学习技术等一些方法相比,这些研究很少。在本文中,我们提出了一种使用具有基因组序列特征(例如 k-mers、序列比对和能量折叠值)的监督和无监督机器学习模型的结构,验证 miRNA 和 lncRNA 与乳腺癌和肿瘤情况的关联。使用一类 SVM 进行异常值检测并比较 SVM 和随机森林等两种监督模型,我们设法获得一类模型的准确率结果为 95.44%,SVM 和随机森林模型的准确率分别为 88.79% 和 99.65%,分别。结果表明,与现有文献中发现的机器学习模型相结合的序列特征交互研究有希望的途径。

图形摘要

更新日期:2021-06-21
down
wechat
bug