当前位置: X-MOL 学术J. Comput. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
NetMix: A Network-Structured Mixture Model for Reduced-Bias Estimation of Altered Subnetworks
Journal of Computational Biology ( IF 1.7 ) Pub Date : 2021-05-20 , DOI: 10.1089/cmb.2020.0435
Matthew A Reyna 1 , Uthsav Chitra 2 , Rebecca Elyanow 2, 3 , Benjamin J Raphael 2
Affiliation  

A classic problem in computational biology is the identification of altered subnetworks: subnetworks of an interaction network that contain genes/proteins that are differentially expressed, highly mutated, or otherwise aberrant compared with other genes/proteins. Numerous methods have been developed to solve this problem under various assumptions, but the statistical properties of these methods are often unknown. For example, some widely used methods are reported to output very large subnetworks that are difficult to interpret biologically. In this work, we formulate the identification of altered subnetworks as the problem of estimating the parameters of a class of probability distributions that we call the Altered Subset Distribution (ASD). We derive a connection between a popular method, jActiveModules, and the maximum likelihood estimator (MLE) of the ASD. We show that the MLE is statistically biased, explaining the large subnetworks output by jActiveModules. Based on these insights, we introduce NetMix, an algorithm that uses Gaussian mixture models to obtain less biased estimates of the parameters of the ASD. We demonstrate that NetMix outperforms existing methods in identifying altered subnetworks on both simulated and real data, including the identification of differentially expressed genes from both microarray and RNA-seq experiments and the identification of cancer driver genes in somatic mutation data.

中文翻译:

NetMix:用于改变子网的减少偏差估计的网络结构混合模型

计算生物学中的一个经典问题是识别改变的子网络:交互网络的子网络,其中包含与其他基因/蛋白质相比差异表达、高度突变或异常的基因/蛋白质。在各种假设下已经开发了许多方法来解决这个问题,但这些方法的统计特性通常是未知的。例如,据报道,一些广泛使用的方法会输出非常大的子网络,这些子网络很难从生物学上解释。在这项工作中,我们将改变子网络的识别表述为估计一类概率分布的参数的问题,我们称之为改变子集分布 (ASD)。我们推导出流行方法 jActiveModules 和 ASD 的最大似然估计器 (MLE) 之间的联系。我们证明了 MLE 在统计上是有偏差的,解释 jActiveModules 输出的大型子网。基于这些见解,我们介绍了 NetMix,这是一种使用高斯混合模型来获得 ASD 参数的偏差较小的估计的算法。我们证明 NetMix 在识别模拟和真实数据上的改变子网络方面优于现有方法,包括识别来自微阵列和 RNA-seq 实验的差异表达基因以及识别体细胞突变数据中的癌症驱动基因。
更新日期:2021-05-22
down
wechat
bug