当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MLDEG: A Machine Learning Approach to Identify Differentially Expressed Genes Using Network Property and Network Propagation
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 4.5 ) Pub Date : 2021-03-22 , DOI: 10.1109/tcbb.2021.3067613
Ji Hwan Moon 1 , Sangseon Lee 2 , Minwoo Pak 3 , Benjamin Hur 4 , Sun Kim 5
Affiliation  

Motivation: Identifying differentially expressed genes (DEGs) in transcriptome data is a very important task. However, performances of existing DEG methods vary significantly for data sets measured in different conditions and no single statistical or machine learning model for DEG detection perform consistently well for data sets of different traits. In addition, setting a cutoff value for the significance of differential expressions is one of confounding factors to determine DEGs. Results: We address these problems by developing an ensemble model that refines the heterogeneous and inconsistent results of the existing methods by taking accounts into network information such as network propagation and network property. DEG candidates that are predicted with weak evidence by the existing tools are re-classified by our proposed ensemble model for the transcriptome data. Tested on 10 RNA-seq datasets downloaded from gene expression omnibus (GEO), our method showed excellent performance of winning the first place in detecting ground truth (GT) genes in eight datasets and find almost all GT genes in six datasets. On the other hand, performances of all existing methods varied significantly for the 10 data sets. Because of the design principle, our method can accommodate any new DEG methods naturally. Availability: The source code of our method is available at https://github.com/jihmoon/MLDEG.

中文翻译:

MLDEG:一种使用网络属性和网络传播识别差异表达基因的机器学习方法

动机:识别转录组数据中的差异表达基因(DEG)是一项非常重要的任务。然而,对于在不同条件下测量的数据集,现有 DEG 方法的性能差异很大,并且没有一个用于 DEG 检测的单一统计或机器学习模型对于不同特征的数据集表现一致。此外,为差异表达的显着性设置一个截止值是确定 DEG 的混杂因素之一。结果:我们通过开发一个集成模型来解决这些问题,该模型通过考虑网络传播和网络属性等网络信息来改进现有方法的异构和不一致的结果。现有工具用弱证据预测的 DEG 候选者被我们提出的转录组数据集成模型重新分类。在从基因表达综合 (GEO) 下载的 10 个 RNA-seq 数据集上进行测试,我们的方法表现出优异的性能,在 8 个数据集中检测基本真值 (GT) 基因方面获得第一名,并在 6 个数据集中找到几乎所有 GT 基因。另一方面,对于这 10 个数据集,所有现有方法的性能差异很大。由于设计原则,我们的方法可以自然地适应任何新的 DEG 方法。可用性:我们方法的源代码可在https://github.com/jihmoon/MLDEG.
更新日期:2021-03-22
down
wechat
bug