当前位置: X-MOL 学术J. Biomed. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A survey on single and multi omics data mining methods in cancer data classification.
Journal of Biomedical informatics ( IF 4.5 ) Pub Date : 2020-06-07 , DOI: 10.1016/j.jbi.2020.103466
Zahra Momeni 1 , Esmail Hassanzadeh 1 , Mohammad Saniee Abadeh 2 , Riccardo Bellazzi 3
Affiliation  

Data analytics is routinely used to support biomedical research in all areas, with particular focus on the most relevant clinical conditions, such as cancer. Bioinformatics approaches, in particular, have been used to characterize the molecular aspects of diseases. In recent years, numerous studies have been performed on cancer based upon single and multi-omics data. For example, Single-omics-based studies have employed a diverse set of data, such as gene expression, DNA methylation, or miRNA, to name only a few instances. Despite that, a significant part of literature reports studies on gene expression with microarray datasets. Single-omics data have high numbers of attributes and very low sample counts. This characteristic makes them paradigmatic of an under-sampled, small-n large-p machine learning problem. An important goal of single-omics data analysis is to find the most relevant genes, in terms of their potential use in clinics and research, in the batch of available data. This problem has been addressed in gene selection as one of the pre-processing steps in data mining. An analysis that use only one type of data (single-omics) often miss the complexity of the landscape of molecular phenomena underlying the disease. As a result, they provide limited and sometimes poorly reliable information about the disease mechanisms. Therefore, in recent years, researchers have been eager to build models that are more complex, obtaining more reliable results using multi-omics data. However, to achieve this, the most important challenge is data integration. In this paper, we provide a comprehensive overview of the challenges in single and multi-omics data analysis of cancer data, focusing on gene selection and data integration methods.



中文翻译:

癌症数据分类中单组学和多组学数据挖掘方法的调查。

数据分析通常用于支持所有领域的生物医学研究,尤其侧重于最相关的临床疾病,例如癌症。特别是,生物信息学方法已被用来表征疾病的分子方面。近年来,基于单组学和多组学数据对癌症进行了大量研究。例如,基于单组学的研究仅使用了多种数据集,例如基因表达,DNA甲基化或miRNA等。尽管如此,大量文献报道了利用微阵列数据集进行基因表达的研究。单组学数据具有大量的属性和非常低的样本数。此特征使它们成为样本不足,小n大p机器学习问题的范例。单组学数据分析的一个重要目标是在一批可用数据中找到最相关的基因,就其在临床和研究中的潜在用途而言。作为数据挖掘中的预处理步骤之一,基因选择已解决了这个问题。仅使用一种类型的数据(单组学)的分析通常会忽略该疾病背后的分子现象的复杂性。结果,它们提供了有关疾病机制的有限的,有时可靠性差的信息。因此,近年来,研究人员一直渴望建立更复杂的模型,并使用多组学数据获得更可靠的结果。但是,要实现这一目标,最重要的挑战是数据集成。在本文中,

更新日期:2020-06-07
down
wechat
bug