当前位置: X-MOL 学术Hum. Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A semi-supervised machine learning framework for microRNA classification.
Human Genomics ( IF 4.5 ) Pub Date : 2019-10-22 , DOI: 10.1186/s40246-019-0221-7
Mohsen Sheikh Hassani 1 , James R Green 1
Affiliation  

BACKGROUND MicroRNAs (miRNAs) are a family of short, non-coding RNAs that have been linked to critical cellular activities, most notably regulation of gene expression. The identification of miRNA is a cross-disciplinary approach that requires both computational identification methods and wet-lab validation experiments, making it a resource-intensive procedure. While numerous machine learning methods have been developed to increase classification accuracy and thus reduce validation costs, most methods use supervised learning and thus require large labeled training data sets, often not feasible for less-sequenced species. On the other hand, there is now an abundance of unlabeled RNA sequence data due to the emergence of high-throughput wet-lab experimental procedures, such as next-generation sequencing. RESULTS This paper explores the application of semi-supervised machine learning for miRNA classification in order to maximize the utility of both labeled and unlabeled data. We here present the novel combination of two semi-supervised approaches: active learning and multi-view co-training. Results across six diverse species show that this multi-stage semi-supervised approach is able to improve classification performance using very small numbers of labeled instances, effectively leveraging the available unlabeled data. CONCLUSIONS The proposed semi-supervised miRNA classification pipeline holds the potential to identify novel miRNA with high recall and precision while requiring very small numbers of previously known miRNA. Such a method could be highly beneficial when studying miRNA in newly sequenced genomes of niche species with few known examples of miRNA.

中文翻译:

用于microRNA分类的半监督机器学习框架。

背景技术MicroRNA(miRNA)是短的非编码RNA家族,已与关键的细胞活性,尤其是基因表达的调节相关。miRNA的鉴定是一种跨学科的方法,需要计算鉴定方法和湿实验室验证实验,这使其成为资源密集型程序。尽管已经开发了许多机器学习方法来提高分类准确性,从而降低验证成本,但是大多数方法都使用监督学习,因此需要标记较大的训练数据集,对于序列较少的物种通常不可行。另一方面,由于高通量湿实验室实验程序(例如下一代测序)的出现,现在存在大量未标记的RNA序列数据。结果本文探索了半监督机器学习在miRNA分类中的应用,以最大化标记和未标记数据的效用。我们在这里介绍两种半监督方法的新颖组合:主动学习和多视图协同训练。六个不同物种的结果表明,这种多阶段半监督方法能够使用很少数量的标记实例来提高分类性能,从而有效地利用可用的未标记数据。结论拟议中的半监督miRNA分类管线具有潜在的潜力,可以以极高的查全率和精确度鉴定新型miRNA,同时需要非常少量的先前已知的miRNA。
更新日期:2020-04-22
down
wechat
bug