当前位置: X-MOL 学术Biometrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Poisson PCA: Poisson measurement error corrected PCA, with application to microbiome data
Biometrics ( IF 1.4 ) Pub Date : 2020-10-02 , DOI: 10.1111/biom.13384
Toby Kenney 1 , Hong Gu 1 , Tianshu Huang 1
Affiliation  

In this paper, we study the problem of computing a principal component analysis of data affected by Poisson noise. We assume samples are drawn from independent Poisson distributions. We want to estimate principal components of a fixed transformation of the latent Poisson means. Our motivating example is microbiome data, though the methods apply to many other situations. We develop a semiparametric approach to correct the bias of variance estimators, both for untransformed and transformed (with particular attention to log-transformation) Poisson means. Furthermore, we incorporate methods for correcting different exposure or sequencing depth in the data. In addition to identifying the principal components, we also address the nontrivial problem of computing the principal scores in this semiparametric framework. Most previous approaches tend to take a more parametric line: for example, fitting a log-normal Poisson (PLN) model. We compare our method with the PLN approach and find that in many cases our method is better at identifying the main principal components of the latent log-transformed Poisson means, and as a further major advantage, takes far less time to compute. Comparing methods on real and simulated data, we see that our method also appears to be more robust to outliers than the parametric method.

中文翻译:

Poisson PCA:Poisson 测量误差校正 PCA,适用于微生物组数据

在本文中,我们研究了计算受泊松噪声影响的数据的主成分分析的问题。我们假设样本来自独立的泊松分布。我们想要估计潜在泊松均值的固定变换的主成分。我们的激励示例是微生物组数据,尽管这些方法适用于许多其他情况。我们开发了一种半参数方法来纠正方差估计量的偏差,包括未变换和变换(特别注意对数变换)泊松均值。此外,我们结合了用于校正数据中不同曝光或测序深度的方法。除了识别主成分外,我们还解决了在这个半参数框架中计算主分数的重要问题。大多数以前的方法倾向于采用更参数化的路线:例如,拟合对数正态泊松 (PLN) 模型。我们将我们的方法与 PLN 方法进行比较,发现在许多情况下,我们的方法更擅长识别潜在对数变换泊松均值的主要主成分,并且作为另一个主要优势,计算时间要少得多。比较真实数据和模拟数据的方法,我们发现我们的方法对异常值似乎也比参数方法更稳健。
更新日期:2020-10-02
down
wechat
bug