当前位置: X-MOL 学术Stata J. Promot. Commun. Stat. Stata › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Analysis of partially observed clustered data using generalized estimating equations and multiple imputation.
The Stata Journal: Promoting communications on statistics and Stata ( IF 3.2 ) Pub Date : 2014-10-01
Kathryn M Aloisio 1 , Sonja A Swanson 2 , Nadia Micali 3 , Alison Field 2 , Nicholas J Horton 4
Affiliation  

Clustered data arise in many settings, particularly within the social and biomedical sciences. As an example, multiple-source reports are commonly collected in child and adolescent psychiatric epidemiologic studies where researchers use various informants (e.g. parent and adolescent) to provide a holistic view of a subject's symptomatology. Fitzmaurice et al. (1995) have described estimation of multiple source models using a standard generalized estimating equation (GEE) framework. However, these studies often have missing data due to additional stages of consent and assent required. The usual GEE is unbiased when missingness is Missing Completely at Random (MCAR) in the sense of Little and Rubin (2002). This is a strong assumption that may not be tenable. Other options such as weighted generalized estimating equations (WEEs) are computationally challenging when missingness is non-monotone. Multiple imputation is an attractive method to fit incomplete data models while only requiring the less restrictive Missing at Random (MAR) assumption. Previously estimation of partially observed clustered data was computationally challenging however recent developments in Stata have facilitated their use in practice. We demonstrate how to utilize multiple imputation in conjunction with a GEE to investigate the prevalence of disordered eating symptoms in adolescents reported by parents and adolescents as well as factors associated with concordance and prevalence. The methods are motivated by the Avon Longitudinal Study of Parents and their Children (ALSPAC), a cohort study that enrolled more than 14,000 pregnant mothers in 1991-92 and has followed the health and development of their children at regular intervals. While point estimates were fairly similar to the GEE under MCAR, the MAR model had smaller standard errors, while requiring less stringent assumptions regarding missingness.

中文翻译:


使用广义估计方程和多重插补分析部分观察到的聚类数据。



聚类数据出现在许多环境中,特别是在社会和生物医学科学领域。例如,在儿童和青少年精神病学流行病学研究中通常会收集多源报告,研究人员使用各种信息提供者(例如父母和青少年)来提供受试者症状学的整体视图。菲茨莫里斯等人。 (1995) 描述了使用标准广义估计方程 (GEE) 框架对多个源模型的估计。然而,由于需要额外的同意和同意阶段,这些研究经常缺少数据。当缺失是 Little 和 Rubin (2002) 意义上的完全随机缺失 (MCAR) 时,通常的 GEE 是无偏的。这是一个强有力的假设,但可能站不住脚。当缺失值非单调时,其他选项(例如加权广义估计方程 (WEE))在计算上具有挑战性。多重插补是一种适合不完整数据模型的有吸引力的方法,同时只需要限制较少的随机缺失 (MAR) 假设。以前对部分观察到的聚类数据的估计在计算上具有挑战性,但 Stata 的最新发展促进了它们在实践中的使用。我们演示了如何利用多重插补与 GEE 结合来调查父母和青少年报告的青少年饮食失调症状的患病率以及与一致性和患病率相关的因素。这些方法的灵感源自雅芳父母及其子女纵向研究 (ALSPAC),这是一项队列研究,在 1991-92 年招募了超过 14,000 名怀孕母亲,并定期跟踪她们孩子的健康和发育情况。 虽然点估计与 MCAR 下的 GEE 相当相似,但 MAR 模型的标准误差较小,同时需要对缺失的假设不太严格。
更新日期:2019-11-01
down
wechat
bug