当前位置: X-MOL 学术J. R. Stat. Soc. A › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A causal inference framework for cancer cluster investigations using publicly available data
The Journal of the Royal Statistical Society, Series A (Statistics in Society) ( IF 1.5 ) Pub Date : 2020-04-25 , DOI: 10.1111/rssa.12567
Rachel C Nethery 1 , Yue Yang 1 , Anna J Brown 2 , Francesca Dominici 1
Affiliation  

Often, a community becomes alarmed when high rates of cancer are noticed, and residents suspect that the cancer cases could be caused by a known source of hazard. In response, the US Centers for Disease Control and Prevention recommend that departments of health perform a standardized incidence ratio (SIR) analysis to determine whether the observed cancer incidence is higher than expected. This approach has several limitations that are well documented in the existing literature. We propose a novel causal inference framework for cancer cluster investigations, rooted in the potential outcomes framework. Assuming that a source of hazard representing a potential cause of increased cancer rates in the community is identified a priori , we focus our approach on a causal inference estimand which we call the causal SIR. The causal SIR is a ratio defined as the expected cancer incidence in the exposed population divided by the expected cancer incidence for the same population under the (counterfactual) scenario of no exposure. To estimate the causal SIR we need to overcome two main challenges: first, we must identify unexposed populations that are as similar as possible to the exposed population to inform estimation of the expected cancer incidence under the counterfactual scenario of no exposure, and, second, publicly available data on cancer incidence for these unexposed populations are often available at a much higher level of spatial aggregation (e.g. county) than what is desired (e.g. census block group). We overcome the first challenge by relying on matching. We overcome the second challenge by building a Bayesian hierarchical model that borrows information from other sources to impute cancer incidence at the desired level of spatial aggregation. In simulations, our statistical approach was shown to provide dramatically improved results, i.e. less bias and better coverage, than the current approach to SIR analyses. We apply our proposed approach to investigate whether trichloroethylene vapour exposure has caused increased cancer incidence in Endicott, New York.

中文翻译:


使用公开数据进行癌症集群调查的因果推理框架



当发现癌症发病率很高时,社区通常会感到震惊,居民怀疑癌症病例可能是由已知的危险源引起的。对此,美国疾病控制与预防中心建议卫生部门进行标准化发病率(SIR)分析,以确定观察到的癌症发病率是否高于预期。这种方法有一些局限性,这些局限性在现有文献中有详细记录。我们提出了一种基于潜在结果框架的癌症集群调查的新颖因果推理框架。假设先验地确定了代表社区癌症发病率增加的潜在原因的危险源,我们将我们的方法集中在因果推断估计值上,我们称之为因果 SIR。因果 SIR 是一个比率,定义为暴露人群中的预期癌症发病率除以相同人群在(反事实)无暴露情况下的预期癌症发病率。为了估计因果 SIR,我们需要克服两个主要挑战:首先,我们必须确定与暴露人群尽可能相似的未暴露人群,以便在没有暴露的反事实情况下估计预期的癌症发病率,其次,关于这些未接触人群的癌症发病率的公开数据,其空间聚合水平(例如县)通常比所需水平(例如人口普查区块组)高得多。我们依靠匹配来克服第一个挑战。我们通过构建贝叶斯分层模型克服了第二个挑战,该模型借用其他来源的信息,以所需的空间聚合水平估算癌症发病率。 在模拟中,我们的统计方法被证明可以提供显着改进的结果,即比当前的 SIR 分析方法更少的偏差和更好的覆盖范围。我们应用我们提出的方法来调查三氯乙烯蒸气暴露是否导致纽约恩迪科特的癌症发病率增加。
更新日期:2020-06-19
down
wechat
bug