当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MLDEG: A Machine Learning Approach to Identify Differentially Expressed Genes Using Network Property and Network Propagation.
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 3.6 ) Pub Date : 2021-03-22 , DOI: 10.1109/tcbb.2021.3067613
Ji Hwan Moon , Sangseon Lee , Minwoo Pak , Benjamin Hur , Sun Kim

MOTIVATION Identifying differentially expressed genes (DEGs) in transcriptome data is a very important task. However, performances of existing DEG methods vary significantly for data sets measured in different conditions and no single statistical or machine learning model for DEG detection perform consistently well for data sets of different traits. In addition, setting a cutoff value for the significance of differential expressions is one of confounding factors to determine DEGs. RESULTS We address these problems by developing an ensemble model that refines the heterogeneous and inconsistent results of the existing methods by taking accounts into network information such as network propagation and network property. DEG candidates that are predicted with weak evidence by the existing tools are re-classified by our proposed ensemble model for the transcriptome data. Tested on 10 RNA-seq datasets downloaded from gene expression omnibus (GEO), our method showed excellent performance of winning the first place in detecting ground truth (GT) genes in eight datasets and find almost all GT genes in six datasets. On the other hand, performances of all existing methods varied significantly for the 10 data sets. Because of the design principle, our method can accommodate any new DEG methods naturally.

中文翻译:

MLDEG:一种使用网络属性和网络传播来识别差异表达基因的机器学习方法。

动机鉴定转录组数据中的差异表达基因(DEG)是一项非常重要的任务。但是,现有DEG方法的性能对于在不同条件下测得的数据集有很大的不同,并且没有用于DEG检测的单一统计或机器学习模型对于不同特征的数据集都能始终如一地表现良好。此外,为差异表达式的重要性设置一个临界值是确定DEG的混淆因素之一。结果我们通过开发一个集成模型来解决这些问题,该模型通过考虑网络信息(例如网络传播和网络属性)来完善现有方法的异类和不一致结果。现有工具在证据不足的情况下预测的DEG候选对象将通过我们提出的转录组数据集成模型重新分类。在从基因表达综合(GEO)下载的10个RNA-seq数据集上进行测试,我们的方法显示出优异的性能,赢得了八个数据集中检测地面真相(GT)基因的第一名,并在六个数据集中找到了几乎所有的GT基因。另一方面,对于10个数据集,所有现有方法的性能差异很大。由于设计原理的原因,我们的方法自然可以适应任何新的DEG方法。另一方面,对于10个数据集,所有现有方法的性能差异很大。由于设计原理的原因,我们的方法自然可以适应任何新的DEG方法。另一方面,对于10个数据集,所有现有方法的性能差异很大。由于设计原理的原因,我们的方法自然可以适应任何新的DEG方法。
更新日期:2021-03-22
down
wechat
bug