当前位置: X-MOL 学术Cancer Genom. Proteom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine Learning Approaches on High Throughput NGS Data to Unveil Mechanisms of Function in Biology and Disease.
Cancer Genomics & Proteomics ( IF 2.5 ) Pub Date : 2021-9-5 , DOI: 10.21873/cgp.20284
Vasileios C Pezoulas 1 , Orsalia Hazapis 2 , Nefeli Lagopati 2, 3 , Themis P Exarchos 4, 5 , Andreas V Goules 6 , Athanasios G Tzioufas 6 , Dimitrios I Fotiadis 1 , Ioannis G Stratis 7 , Athanasios N Yannacopoulos 8 , Vassilis G Gorgoulis 3, 9, 10, 11, 12
Affiliation  

In this review, the fundamental basis of machine learning (ML) and data mining (DM) are summarized together with the techniques for distilling knowledge from state-of-the-art omics experiments. This includes an introduction to the basic mathematical principles of unsupervised/supervised learning methods, dimensionality reduction techniques, deep neural networks architectures and the applications of these in bioinformatics. Several case studies under evaluation mainly involve next generation sequencing (NGS) experiments, like deciphering gene expression from total and single cell (scRNA-seq) analysis; for the latter, a description of all recent artificial intelligence (AI) methods for the investigation of cell sub-types, biomarkers and imputation techniques are described. Other areas of interest where various ML schemes have been investigated are for providing information regarding transcription factors (TF) binding sites, chromatin organization patterns and RNA binding proteins (RBPs), while analyses on RNA sequence and structure as well as 3D dimensional protein structure predictions with the use of ML are described. Furthermore, we summarize the recent methods of using ML in clinical oncology, when taking into consideration the current omics data with pharmacogenomics to determine personalized treatments. With this review we wish to provide the scientific community with a thorough investigation of main novel ML applications which take into consideration the latest achievements in genomics, thus, unraveling the fundamental mechanisms of biology towards the understanding and cure of diseases.

中文翻译:

利用高通量 NGS 数据的机器学习方法揭示生物学和疾病的功能机制。

在这篇综述中,总结了机器学习(ML)和数据挖掘(DM)的基本原理以及从最先进的组学实验中提取知识的技术。这包括介绍无监督/监督学习方法的基本数学原理、降维技术、深度神经网络架构及其在生物信息学中的应用。正在评估的几个案例研究主要涉及下一代测序(NGS)实验,例如从总细胞和单细胞(scRNA-seq)分析中破译基因表达;对于后者,描述了用于研究细胞亚型、生物标志物和插补技术的所有最新人工智能(AI)方法。研究各种 ML 方案的其他感兴趣领域是提供有关转录因子 (TF) 结合位点、染色质组织模式和 RNA 结合蛋白 (RBP) 的信息,同时分析 RNA 序列和结构以及 3D 维蛋白质结构预测并描述了 ML 的使用。此外,我们总结了在临床肿瘤学中使用机器学习的最新方法,同时考虑到当前的组学数据和药物基因组学来确定个性化治疗。通过这篇综述,我们希望为科学界提供对主要新颖机器学习应用的彻底研究,其中考虑到基因组学的最新成就,从而揭示生物学理解和治疗疾病的基本机制。
更新日期:2021-09-05
down
wechat
bug