当前位置: X-MOL 学术Stat › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A family of parsimonious mixtures of multivariate Poisson‐lognormal distributions for clustering multivariate count data
Stat ( IF 1.7 ) Pub Date : 2020-08-25 , DOI: 10.1002/sta4.310
Sanjeena Subedi 1 , Ryan P. Browne 2
Affiliation  

Multivariate count data are commonly encountered through high‐throughput sequencing technologies in bioinformatics, text mining, or sports analytics. Although the Poisson distribution seems a natural fit to these count data, its multivariate extension is computationally expensive. In most cases, mutual independence among the variables is assumed; however, this fails to take into account the correlation among the variables usually observed in the data. Recently, mixtures of multivariate Poisson‐lognormal (MPLN) models have been used to analyze such multivariate count measurements with a dependence structure. In the MPLN model, each count is modeled using an independent Poisson distribution conditional on a latent multivariate Gaussian variable. Owing to this hierarchical structure, the MPLN model can account for over‐dispersion as opposed to the traditional Poisson distribution and allows for correlation between the variables. Rather than relying on a Monte Carlo‐based estimation framework, which is computationally inefficient, a fast variational expectation–maximization (EM)‐based framework is used here for parameter estimation. Further, a family of parsimonious mixtures of Poisson‐lognormal distributions is proposed by decomposing the covariance matrix and imposing constraints on these decompositions. Utility of such models is shown using simulated and benchmark datasets.

中文翻译:

聚类多元计数数据的多元Poisson对数正态分布的简约混合的族

在生物信息学,文本挖掘或体育分析中,高通量测序技术通常会遇到多变量计数数据。尽管泊松分布似乎很自然地适合这些计数数据,但是其多元扩展在计算上非常昂贵。在大多数情况下,假设变量之间具有相互独立性。但是,这没有考虑到通常在数据中观察到的变量之间的相关性。最近,多元泊松对数正态(MPLN)模型的混合物已用于分析具有依存结构的此类多元计数测量。在MPLN模型中,每个计数都是使用独立的Poisson分布建模的,条件是潜在的多元高斯变量。由于这种层次结构,与传统的Poisson分布相反,MPLN模型可以解决过度分散问题,并允许变量之间具有相关性。在此,不依赖于计算效率低的基于Monte Carlo的估计框架,而是在此处使用基于快速变化期望最大化(EM)的框架进行参数估计。此外,通过分解协方差矩阵并对这些分解施加约束,提出了一系列泊松对数正态分布的简约混合。使用模拟和基准数据集显示了此类模型的效用。在此使用基于快速变异期望最大化(EM)的框架进行参数估计。此外,通过分解协方差矩阵并对这些分解施加约束,提出了一系列泊松对数正态分布的简约混合。使用模拟和基准数据集显示了此类模型的效用。在此使用基于快速变分期望最大化(EM)的框架进行参数估计。此外,通过分解协方差矩阵并对这些分解施加约束,提出了一系列泊松对数正态分布的简约混合。使用模拟和基准数据集显示了此类模型的效用。
更新日期:2020-10-30
down
wechat
bug