当前位置: X-MOL 学术J. Bioinform. Comput. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Integration of multi-omics data to mine cancer-related gene modules
Journal of Bioinformatics and Computational Biology ( IF 0.9 ) Pub Date : 2019-11-25 , DOI: 10.1142/s0219720019500380
Peng Li 1, 2 , Maozu Guo 2 , Bo Sun 1
Affiliation  

The identification of cancer-related genes is a major research goal, with implications for determining the pathogenesis of cancer and identifying biomarkers for early diagnosis and treatment. In this study, by integrating multi-omics data, including gene expression, DNA copy number variation, DNA methylation, transcription factors, miRNA, and lncRNA data, we propose a method for mining cancer-related genes based on network models. First, using random forest-based feature selection method multi-omics data are integrated to identify key regulatory factors that affect gene expression, and then genome-wide regulatory networks are constructed. Next, by comparing the regulatory networks of key candidate genes in variant samples and non-variant samples, a differential expression regulatory network is generated. The differential network contains a collection of abnormal regulatory genes of key candidate genes. Then, by introducing the functional similarity as a distance metric for gene sets, a density-based clustering method is used to mine gene modules related to cancer. We applied this method to LUSC (lung squamous cell carcinoma) and mined cancer-related gene modules composed of 20 genes. GO function and KEGG pathway analyses indicated that the modules were closely related to cancer. A survival analysis was used to verify that the excavated gene modules can effectively distinguish between high- and low-risk groups. Overall, these results suggest that the proposed method can be used to identify cancer-related gene modules, providing a basis for the development of biomarkers for diagnosis and treatment.

中文翻译:

整合多组学数据以挖掘癌症相关基因模块

癌症相关基因的鉴定是一个主要的研究目标,对于确定癌症的发病机制和鉴定用于早期诊断和治疗的生物标志物具有重要意义。在这项研究中,我们通过整合多组学数据,包括基因表达、DNA拷贝数变异、DNA甲基化、转录因子、miRNA和lncRNA数据,提出了一种基于网络模型的癌症相关基因挖掘方法。首先,使用基于随机森林的特征选择方法整合多组学数据,识别影响基因表达的关键调控因素,然后构建全基因组调控网络。接下来,通过比较变异样本和非变异样本中关键候选基因的调控网络,生成差异表达调控网络。差异网络包含关键候选基因的异常调控基因的集合。然后,通过引入功能相似性作为基因集的距离度量,使用基于密度的聚类方法来挖掘与癌症相关的基因模块。我们将这种方法应用于 LUSC(肺鳞状细胞癌)并挖掘了由 20 个基因组成的癌症相关基因模块。GO功能和KEGG通路分析表明这些模块与癌症密切相关。通过生存分析验证挖掘出的基因模块可以有效区分高危和低危人群。总体而言,这些结果表明,该方法可用于识别癌症相关基因模块,为开发用于诊断和治疗的生物标志物提供基础。
更新日期:2019-11-25
down
wechat
bug