A comparison study on modeling of clustered and overdispersed count data for multiple comparisons,Journal of Applied Statistics

当前位置： X-MOL 学术 › J. Appl. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A comparison study on modeling of clustered and overdispersed count data for multiple comparisons
Journal of Applied Statistics ( IF 1.2 ) Pub Date : 2020-07-03 , DOI: 10.1080/02664763.2020.1788518
Jochen Kruppa _{1,

2} , Ludwig Hothorn ₃

Affiliation

Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered – e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.

中文翻译：

用于多重比较的聚集和过度分散计数数据建模的比较研究

在各个科学领域收集的数据是计数数据。分析此类数据的一种方法是使用多重比较来比较因素处理的各个水平。然而，被测量的个体通常是聚集在一起的——例如根据垃圾或饲养。在通过重复测量模型估计参数时必须考虑这一点。此外，忽略计数数据容易出现的过度离散会导致第一类错误率的增加。我们使用几种不同的数据设置进行模拟研究，并将不同的多重对比测试与广义估计方程和广义线性混合模型的参数估计值进行比较，以观察覆盖率和拒绝概率。我们产生过度分散，在许多生物环境中可以观察到小样本中的聚集计数数据。我们发现，如果正确指定方差-三明治估计量，则广义估计方程优于广义线性混合模型。此外，广义线性混合模型在某些数据设置下的收敛速度存在问题，但存在影响较低的模型实现。最后，我们用一个遗传数据的例子来说明多重对比检验的应用和忽略强过度离散的问题。但存在影响较低的模型实现。最后，我们用一个遗传数据的例子来说明多重对比检验的应用和忽略强过度离散的问题。但存在影响较低的模型实现。最后，我们用一个遗传数据的例子来说明多重对比检验的应用和忽略强过度离散的问题。

更新日期：2020-07-03

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11