Skip to content
BY-NC-ND 3.0 license Open Access Published by De Gruyter September 6, 2017

Counterfactual-Based Prevented and Preventable Proportions

  • Kentaro Yamada EMAIL logo and Manabu Kuroki

Abstract

Prevented and preventable fractions have been widely used in medical science to evaluate the proportion of new diseases that can be averted by a protective exposure. However, most existing formulas used in practical situations cannot be interpreted as proportions without any further assumptions because they are obtained according to different target populations and may fall outside the range [0.000,1.000]. To solve this problem, this paper proposes counterfactual-based prevented and preventable proportions. When both causal effects and observed probabilities are available, we show that the proposed measures are identifiable under the negative monotonicity assumption. Additionally, when the negative monotonicity assumption is violated, we formulate the bounds on the proposed measures. We also show that negative monotonicity together with exogeneity induces equivalence between the proposed measures and existing measures.

1 Introduction

In medical science, one of important issues is to assess how much a disease can be averted by a protective exposure, in subjects that would otherwise suffer from the disease. For example, in a 1970 measles epidemic in Texarkana, Texas, USA, some diseases occurred in children who had been vaccinated against measles. At that time, the public questioned the effectiveness of the measles vaccine and Landrioan [1] reported that its efficacy was 95.9%. According to Gregg [2], this implies that vaccination prevented 96% of diseases that would have occurred in vaccinated children, had they not been vaccinated. In a 2004 varicella outbreak in Nebraska, USA, some diseases occurred in children who had been vaccinated against varicella and the Centers for Disease Control and Prevention [3] reported that the vaccine’s effectiveness was approximately 81%. According to Gregg [2], this implies that vaccination prevented approximately 80% of diseases that would have occurred in vaccinated children, had they not been vaccinated.

To measure such an impact in vaccine trials, Greenwood and Yule [4] proposed vaccine efficacy, which is defined as the proportion of undiseased subjects in the vaccinated group who, under ideal conditions, would become diseased if they had not been vaccinated. Weinberg and Szilagyi [5] stated that vaccine efficacy is best measured by double-blind, randomized, clinical controlled trials, such as those performed for the pentavalent and monovalent rotavirus vaccines. For exposures or interventions other than vaccines, vaccine efficacy is equivalent to the prevented fraction among the exposed [2]. In the field of epidemiology, Miettinen [6] formulated the prevented fraction to evaluate the effectiveness of a protective exposure. In biostatistics, Gargiullo et al. [7] proposed the variance estimator of the prevented fraction based on maximum likelihood theory for cross-sectional studies. Benichou [8] reviewed adjusted methods of estimating the prevented fraction and other ratio scales while controlling for other factors. Additionally, Laaksonen [9] discussed the calculation of the prevented fraction and its related measures based on a cohort study design. Recently, Greenland [10] discussed concepts and pitfalls in measuring and interpreting attributable fractions and prevented fractions. As observed from these vigorous discussions, the evaluation problem of vaccine efficacy and the prevented fraction is a classical topic, but remains of great interest in medical statistics even now.

When we reviewed early literature on the prevented fraction, we found that Miettinen [6] described one of the fundamental concepts of the prevented fraction that provides the motivation for this paper:

“the preventive (prevented) fraction is the proportion of cases prevented by the factor among the totality of cases that would have developed in the absence of the protective factor”[6].

As seen from Miettinen’s statement, the prevented fraction should be formulated based on the framework of causal inference, but it should not be considered within the framework of statistics routinely. In fact, Porta et al. [11, p. 224] and Boslaugh [12, p. 59] also stated

“In a study of a total population, the prevented fraction of the incidence rate is computed as (IuIp)/Iu or Pe(1Ip/Iu), where Ip is the rate of the disease in the population, Iu is what the rate would be if everyone were not exposed, and Pe is the prevalence of exposure” [11].

and

“Whereas the denominators of attributable fractions are observable numbers of cases, the denominators of prevented fractions are nonobservable (counterfactual)” [12].

However, to the best of our knowledge, most biostatistical researchers and practitioners do not appear to take this counterfactual aspect into account. Additionally, because measures in previous studies on the prevented fraction are formulated based on different target populations, they cannot be interpreted as proportions without any further assumptions and may fall outside the range [0.000,1.000]. Similar to but different from the prevented fraction, according to Porta et al. [11, p. 223], the preventable fraction is defined as the proportion of the disease in the population that would be prevented if the whole population were exposed to the factor. However, this measure is also formulated based on different target populations, and thus cannot be interpreted as a proportion.

To solve this problem, this paper defines prevented and preventable proportions based on the potential outcome model. The proposed measures are formulated based on a single target population and can be interpreted as proportions. Based on a similar motivation to ours, Suzuki et al. [13] reformulated the excess fraction, attributable fraction and etiologic fraction, and derived the bounds on these measures. However, they did not focus on the prevented and preventable fractions. Additionally, they derived their bounds under exogeneity. Conversely, we assume that both causal effects and observed probabilities are available. This assumption, which is weaker than exogeneity, enables us to evaluate the proposed measures when the causal effect is identifiable from observational studies, as well as experimental studies. In this situation, we show that the proposed measures are identifiable under the negative monotonicity assumption. When the negative monotonicity assumption is violated, we formulate the bounds within which the proposed measures must lie. The results show that the traditional prevented and preventable fractions under-evaluate the proposed measures under exogeneity assumption. Furthermore, we show that negative monotonicity together with exogeneity induces equivalence between the proposed measures and existing measures. The extensions to the case of multi-categorical exposure and the applications to experimental and observational studies are provided in the appendices.

2 Existing measures

Let X be a dichotomous protective exposure {x(unexposed),x(exposed) and Y be a dichotomous outcome {y(undiseased),y(diseased)}. Additionally, pr(x,y), pr(x), and pr(y|x) indicate the joint probability of (X,Y)=(x,y), the marginal probability of X=x, and the conditional probability of Y=y given X=x, respectively; similar notation is used for the other probabilities.

In this section, we introduce three representative prevented and preventable fractions: the prevented fraction in the population (PFp), the prevented fraction among the exposed (PFe), and the preventable fraction (PaFp).

In order to evaluate the prevented impact of a protective exposure on the total number of diseases that would be expected if exposure was absent, Miettinen [6] proposed the PFp as

(1)PFp=pr(y|x)pr(y|x)pr(y|x)pr(x).

According to Miettinen [6], PFp is interpreted as “the proportion of cases prevented by the (protective) factor among the totality of cases that would have developed in the absence of the protective factor.”

Additionally, biostatistical researchers and practitioners may want to evaluate the proportion of potential exposed cases that were prevented by the exposure [6, 14]. Miettinen [6] and Kleinbaum et al. [14] formulated PFe as

(2)PFe=pr(y|x)pr(y|x)pr(y|x).

Unlike PFp, PFe focuses on the prevented impact of a protective exposure on the expected number of exposed cases that would have developed in the absence of the protective factor [14]. According to Spasoff [15], PFe is identical to the relative risk reduction that was proposed by Sackett et al. [16] in clinical epidemiology and evaluates how much of the risk is reduced in the exposed group compared with the unexposed group. It is also equivalent to vaccine efficacy, which was originally discussed by Greenwood and Yule [4] in the context of vaccine trials.

Although the prevented fraction is frequently referred to as the preventable fraction, these concepts should be distinguished. For example, Boslaugh [12] stated that

“With protective risk factors, researchers and policy analysts are often interested in how much of the current disease risk in the total target population is potentially preventable if everyone in the population were exposed. This measure, called the preventable fraction, should be distinguished from the prevented fraction. Whereas the PFp reflects the previous impact of being exposed in the population, the PaFp reflects the potential impact in the future if everyone were to become exposed” [12].

Porta et al. [11] and other researchers defined PaFp as

(3)PaFp=pr(y)pr(y|x)pr(y)=pr(y|x)pr(y|x)pr(y)pr(x),

which is interpreted as “the proportion of the disease (in the population) that would be prevented if the whole population were exposed to the factor” [11]. To observe the difference between PFp and PaFp, Spasoff [15] considered the hypothetical epidemic in a population in which 40% of subjects had been immunized, and in which the cumulative incidence of the disease was 10% in unimmunized subjects and 2% in immunized subjects, that is, pr(x)=1pr(x)=0.400, pr(y|x)=0.020, pr(y|x)=0.100 and pr(y)=pr(y|x)pr(x)+pr(y|x)pr(x)=0.068. Then, PFp is given by 0.320, which implies that the immunization program has reduced the incidence of the disease by 32% in the population, that is, the immunization program was 32% effective. By contrast, PaFp is given by 0.706. According to Spasoff [15], this implies that 71% of diseases could have been prevented by increasing the coverage of the immunization program from 40% to 100%.

Although Porta et al. [11] and some researchers introduced the prevented and preventable fractions based on counterfactual concepts, it is unclear how such a concept is introduced into the formulation mathematically. Additionally, it is noted that PFp, PFe and PaFp are formulated based on different target populations: the exposed and unexposed groups. Thus, the traditional prevented and preventable fractions may fall outside the range [0.000,1.000] without any further assumptions and they cannot be interpreted as proportions. To ensure that these measures are within the range [0.000,1.000], biostatistical researchers and practitioners often assume that “exposure to a given factor is believed to protect against a disease” [11], which is supported by negative monotonicity that will be introduced in Section 4.1, or that

(4)pr(y|x)pr(y|x)1

holds [2, 6]. However, generally, the former assumption cannot be verified from observed data because the phrase “protect against a disease” conveys much more than statistical association and implies a cause-and-effect relationship between the exposure and the disease. Additionally, when unmeasured confounders exist, the former may not imply the latter and vice versa.

3 Proposed measures

In this section, we introduce the potential outcome variables that will be used to define the counterfactual-based prevented and preventable proportions.

3.1 Preliminaries

In the potential outcome framework [17, 18], the ith of N subjects has both a potential outcome variable Yx(i) that would have resulted if X had been x, and a potential outcome variable Yx(i) that would have resulted if X had been x. To discuss our problem, this paper assumes the stable unit treatment value assumption (SUTVA), which can be summarized as follows: (i) the exposure status of any subject does not affect the outcomes of the other subjects (non-interference) and (ii) the exposures for all subjects are comparable (no variation in treatment). Thus, when N subjects in a study are considered as random samples from some population, Yx(i) and Yx(i) can be referred to as the values of random variables Yx and Yx, respectively. The causal effect is defined as pr(Yx=y)=pr(yx), where yx indicates the counterfactual statement that “variable Y would have the value y, had X been x.” Additionally, the potential outcome Yx is observed only if X is x, and Yx is observed only if X is x. This is the consistency property [18, 19], which is formulated as

(5)(X=x)(Yx=Y).

When a randomized experiment is conducted, X is independent of (Yx,Yx). This condition is often called exogeneity. Under exogeneity, the causal effect is identifiable and are given by pr(yx)=pr(y|x), where “identifiable” means that the causal quantities can be consistently estimated from a joint distribution of observed variables. By contrast, when a randomized experiment is difficult to conduct and only observed data is available, we can still identify the causal effect according to the strongly ignorable treatment assignment (SITA) condition [20] or the back-door criterion in the context of graph-based causal inference [18]. That is, for the exposure X, if there exists a set of observed covariates Z such that X is conditionally independent of (Yx,Yx) given Z, we say that treatment assignment is strongly ignorable given Z, or Z satisfies the SITA condition. Then, the causal effect pr(yx) is identifiable and is given by

(6)pr(yx)=\boldmath zpr(y|x,\boldmath z)pr(\boldmath z).

Other than the SITA condition, there are various types of identification conditions for causal effects pr(yx). For details, refer to Pearl [18] and Tian and Pearl [21].

Finally, because we assume that both X and Y are dichotomous variables, it is noted that there are four possible potential outcome types at the unit-level: (i) a subject who becomes diseased regardless of taking the protective exposure (doomed), that is, {i|(yx(i),yx(i))}; (ii) a subject who does not become diseased only by taking the protective exposure (preventive), that is, {i|(yx(i),yx(i))}; (iii) a subject who does not become diseased only by not taking the protective exposure (causative), that is, {i|(yx(i),yx(i))}; and (iv) a subject who does not become diseased regardless of taking the protective exposure (immune), that is, {i|(yx(i),yx(i))} [22].

3.2 Definition and basic properties

3.2.1 Counterfactual-based prevented proportions (CPPs)

To focus on the prevented impact of a protective exposure on the total number of diseases that would be expected if exposure was absent, we define the counterfactual-based prevented proportion in the population (CPPp) as

(7)CPPp=pr(y|yx).

CPPp is interpreted as the proportion of undiseased subjects in the totality of subjects who have a potential outcome type of either “doomed” or “preventive”.

When we are interested in the prevented impact of a protective exposure on the expected number of exposed cases that would be expected if exposure was absent, we define the counterfactual-based prevented proportion among the exposed (CPPe) as

(8)CPPe=pr(y|x,yx).

CPPe is interpreted as the proportion of undiseased subjects in the group of exposed subjects who have a potential outcome type of either “doomed” or “preventive”.

Noting that pr(x,y,yx)=0 holds from the consistency property, we have

(9)CPPp=pr(x,y|yx)+pr(x,y|yx)=pr(x,y|yx)=CPPepr(x|yx)CPPe,

which shows that CPPp does not directly determine CPPe (and vice versa) and that they are not entirely independent of each other. Additionally, eq. (9) shows that CPPe over-evaluates CPPp.

Because we have pr(x,yx)=pr(yx)pr(x,y) from the consistency property, we obtain

(10)CPPp=pr(yx|x,y)pr(x,y)pr(yx)
(11)CPPe=pr(yx|x,y)pr(x,y)pr(yx)pr(x,y).

Thus, under the condition of pr(yx)pr(x,y), pr(x,y)=0 implies CPPp=CPPe. Finally, under exogeneity, we have

(12)CPPp=pr(x)CPPeandCPPe=pr(yx|yx).

3.2.2 Counterfactual-based preventable proportions (CPaPs)

To propose the counterfactual measure corresponding to the PaF, we pay attention to the following two aspects: (i) the attributable proportions in the population and among the exposed are formulated as pr(yx|y) and pr(yx|x,y) respectively [13] and (ii) the description PaFp (preventable fraction) is equivalent to the population attributable fraction (AFp) in which “exposed” and “unexposed” categories are reversed [12]. We define the counterfactual-based preventable proportions in the population (CPaPp) and among the unexposed (CPaPu) as

(13)CPaPp=pr(yx|y)andCPaPu=pr(yx|x,y),

respectively. CPaPp can be considered as analogical measures of “probability of enablement” and “probability of disablement” [18, 24]. In addition, CPaPu can be regarded as analogical measures of “probability of necessity” and “probability of sufficiency” discussed in the context of “probabilities of causation” [18, 23, 24, 25, 26].

Then, from the consistency property, we have

(14)CPaPp=pr(x,yx|y)+pr(x,yx|y)=pr(x,yx|y)=CPaPupr(x|y)CPaPu.

Thus, under the condition pr(x,y)0, from eq. (14), pr(x,y)=0 implies CPaPp=CPaPu. Additionally, under exogeneity, we have

(15)CPaPu=CPPe.

4 Identification and bounds

4.1 Identification

CPPp, CPPe, CPaPp and CPaPu involve probabilities of potential outcomes; therefore they are not identifiable from observed data without any further assumptions regarding the data generating process. In this section, under negative monotonicity, that is,

(16)pr(yx,yx)=0,

when causal effects pr(yx) and observed probabilities pr(x,y) are available, we show that CPPp, CPPe, CPaPp and CPaPu are identifiable. Negative monotonicity is an important assumption in the discussion of prevented and preventable fractions because it supports the assumption that “exposure to a given factor is believed to protect against a disease” [11] and leads to

(17)pr(yx)pr(yx)1.

Additionally, negative monotonicity, together with exogeneity, connects such an assumption with eq. (4) [2, 6]. We note that eq. (17) does not imply negative monotonicity.

When both causal effects and observed probabilities are available, under negative monotonicity, we have

(18)CPPp=pr(yx)pr(y)pr(yx)=pr(yx|x)pr(y|x)pr(yx)pr(x)
(19)CPPe=pr(yx|x)pr(y|x)pr(yx|x)=pr(yx)pr(yx)pr(x,y)CPPp
(20)CPaPp=pr(y)pr(yx)pr(y)=pr(y|x)pr(yx|x)pr(y)pr(x)
(21)CPaPu=pr(y)pr(yx)pr(x,y)=pr(y|x)pr(yx|x)pr(y|x)=pr(y)pr(x,y)CPaPp.

Importantly, under negative monotonicity and the exogeneity assumption, because we have pr(yx)=pr(y|x), we derive

(22)CPPp=pr(y|x)pr(y|x)pr(y|x)pr(x)
(23)CPPe=CPaPu=pr(y|x)pr(y|x)pr(y|x)
(24)CPaPp=pr(y)pr(y|x)pr(y)=pr(y|x)pr(y|x)pr(y)pr(x),

which shows that CPPp, CPPe and CPaPp are equivalent to PFp, PFe and PaFp, respectively. Thus, under negative monotonicity and the exogeneity assumptions, we have pr(y|x)/pr(y|x)1, and PFp, PFe and PaFp are always within the range [0.000,1.000].

4.2 Bounds

Generally, even if both causal effects and observed probabilities are available, eqs. (7), (8) and (13) are not identifiable without any further assumptions. Thus, one possible solution is to derive the closed-form formulas of the bounds on these measures. To derive the bounds, we use the idea from Tian and Pearl’s bounds [26] for the probabilities of causation.

According to Tian and Pearl [26], when both pr(yx) and pr(x,y) are available without any further assumptions, we have

(25)max0pr(yx)pr(y)pr(x,y,yx)minpr(x,y)pr(yx)pr(x,y)
(26)max0pr(yx)pr(y)pr(x,y,yx)minpr(x,y)pr(yx)pr(x,y).

Regarding CPPp, from eq. (25), we have

(27)max0pr(yx)pr(y)pr(yx)CPPpminpr(x,y)pr(yx)pr(yx)pr(x,y)pr(yx).

When pr(x,y)=0 or pr(yx)=pr(x,y), CPPp equals zero. When pr(x,y)=0 or pr(yx)=pr(x,y), we have

(28)CPPp=pr(yx)pr(y)pr(yx).

Additionally, regarding CPaPp, from eq. (26), we have

(29)max0pr(yx)pr(y)pr(y)CPaPpminpr(x,y)pr(y)pr(yx)pr(x,y)pr(y).

When pr(x,y)=0 or pr(yx)=pr(x,y), CPaPp equals zero. When pr(x,y)=0 or pr(yx)=pr(x,y), we have

(30)CPaPp=pr(yx)pr(y)pr(y).

Regarding the bounds on CPPe and CPaPu, by replacing pr(yx) and pr(y) in their denominators with pr(yx)pr(x,y) and pr(x,y), respectively, we have

(31)max0pr(yx)pr(y)pr(yx)pr(x,y)CPPeminpr(x,y)pr(yx)pr(x,y)1
(32)max0pr(yx)pr(y)pr(x,y)CPaPuminpr(yx)pr(x,y)pr(x,y)1.

Here, under the exogeneity assumption, because we have pr(yx)=pr(y|x), these bounds show that CPPp, CPaPp and CPPe are not lower than PFp, PaFp and PFe, respectively.

One of our idea to derive the narrower bounds than eqs. (27), (29), (31) and (32) is to use the covariate information measured in experimental and observational studies. For the procedure to derive the narrower bounds, refer to Kuroki and Cai [23] and Cai et al. [27, 28]. Another idea is to introduce causal assumptions to derive the narrower bound. For example, as different assumptions from negative monotonicity, that is, eq. (16), we consider the group-level negative monotone treatment response (negative MTR) assumptions for exposed and unexposed groups, that is,

(33)pr(yx|x)pr(yx|x),pr(yx|x)pr(yx|x).

The assumption is derived from the unit-level negative MTR assumptions for exposed and unexposed groups, that is, for x and x,

(34)ifxxYx(i)Yx(i)

for each unit i in both exposed and unexposed groups [29, 30]. Here, eqs. (33) provide the inequalities

(35)pr(x,y,yx)pr(x,y,yx),pr(x,y,yx)pr(x,y,yx).

Although both pr(yx)pr(y) and pr(yx)pr(y) can take negative values without any further assumption, under eqs. (35), they take non-negative values, and the lower bounds on pr(x,y,yx) and pr(x,y,yx) are evaluated by pr(yx)pr(y) and pr(yx)pr(y) respectively.

When we assume negative monotonicity, since we have

(36)pr(x,y)pr(yx)pr(y),pr(y)pr(yx)1pr(x,y),

we derive

(37)0CPPp=pr(yx)pr(y)pr(yx)pr(x,y)1pr(x,y)

and

(38)0CPaPp=pr(y)pr(yx)pr(y)pr(x|y).

Here, the bounds on CPPe are given by

(39)0CPPe=pr(yx)pr(x,y)pr(x,y)pr(yx)pr(x,y)pr(y|x)

but the bounds on CPaPu provide no information in the sense that they are given by the range [0.000,1.000] from

(40)CPaPu=pr(y)pr(yx)pr(x,y).

Finally, when only pr(x,y) is available, without any further assumptions, the bounds on CPPp and CPaPp are given by

(41)0CPPppr(x,y)pr(x,y)+pr(x,y),0CPaPppr(x|y)

respectively, from eqs. (27), (29) and

CPPp=pr(y,yx)pr(x,yx)+pr(x,y)=pr(x,y,yx)pr(x,y,yx)+pr(x,y,yx)+pr(x,y)=pr(yx|x,y)pr(x,y)pr(yx|x,y)pr(x,y)+pr(yx|x,y)pr(x,y)+pr(x,y)pr(x,y)pr(x,y)+pr(x,y).

The bounds on CPPe and CPaPu provide no information in the sense that they are given by the range [0.000,1.000].

The results of this section are summarized in Table 1 under the assumption that observed probabilities are available, where “+” signifies that the corresponding assumption holds and “” is used otherwise.

Table 1

Summary of assumptions and evaluations of CPPs and CPaPs.

AssumptionslEvaluations
NegativelCausal effectslCPPplCPPelCPaPplCPaPu
monotonicityavailability
++identifiableidentifiableidentifiableidentifiable
(see eq. (18))(see eq. (19))(see eq. (20))(see eq. (21))
+boundsboundsboundsbounds
(see eq. (27))(see eq. (31))(see eq. (29))(see eq. (32))
+boundsboundsboundsno information
(see eq. (37))(see eq. (39))(see eq. (38))
boundsno informationboundsno information
(see eq. (41))(see eq. (41))

5 Conclusion

The idea of prevented and preventable fractions has been widely applied in medical science to evaluate how much of a disease could be averted by a protective exposure for subjects that would have become diseased in the absence of the exposure. However, the traditional formulas used in practical situations have a drawback, which is that they cannot be interpreted as proportions without any further assumptions because they are formulated based on different target populations and may fall outside the range [0.000,1.000]. To solve this problem, when both the exposure and outcome are dichotomous, we proposed four types of new measures of potential impact based on the potential outcome model. The proposed measures are proportions: they are defined based on a single target population and are always within the range [0.000,1.000]. When both causal effects and observed probabilities are available, this paper showed that the proposed measures are identifiable under the negative monotonicity assumption. The negative monotonicity assumption can be considered as a causal assumption to justify “exposure to a given factor is believed to protect against a disease” [11]. Additionally, when negative monotonicity is violated, we formulated the bounds on the proposed measures. Furthermore, we showed that negative monotonicity together with exogeneity induces equivalence between the proposed measures and existing measures. Therefore, the proposed measures are helpful for biostatistical researchers and practitioners to assess what percentage of a disease could be averted by a protective exposure for cases that would have become diseased in the absence of the exposure.

Appendices

A. Application

A.1 Experimental Study

In this appendix, we apply our results to data from a randomized trial conducted in Guangxi, China, to test the efficacy of locally produced Vi vaccine. According to Levine [31] and Yang et al. [32], in this trial, 65,287 subjects received a 30-μg dose of Vi vaccine (X=x) and 65,984 controls received a saline dose (X=x). At the time of vaccination, 92% of subjects were children aged between 5 and 19. During 19 months of follow-up, 7 cases of blood culture-confirmed typhoid fever (Y=y) were detected among the vaccinated compared with 23 confirmed cases among the controls, thus demonstrating a vaccine efficacy of 69%. For further details, refer to Levine [31] and Yang et al. [32].

Table 2

Results from an example of the experimental study.

AssumptionsrEvaluations
NegativecCausal effectsrCPPprCPPerCPaPprCPaPu
monotonicityavailability
++0.3440.6920.5310.692
+[0.344,0.497][0.692,1.000][0.531,0.767][0.692,1.000]

Based on the discussion in Section 4, we evaluate CPPp, CPPe, CPaPp and CPaPu of Vi vaccine on typhoid fever. Noting that the data are obtained from a randomized trial, we have pr(yx)=pr(y|x)=7/65,287, pr(yx)=pr(y|x)=23/65,984 and pr(x)=65,287/(65,287+65,984). When we evaluate pr(x,y) by pr(y|x)×pr(x), the bounds on CPPp, CPPe, CPaPp and CPaPu are [0.344,0.497], [0.692,1.000], [0.531,0.767] and [0.692,1.000], respectively. The bounds on CPPe and CPaPu are identical because this example is a randomized trial. Additionally, the bound widths of CPPe and CPaPu are wider than CPPp and CPaPp, respectively. By contrast, under negative monotonicity, we have CPPp=0.344, CPPe=0.692, CPaPp=0.531 and CPaPu=0.692. The results are summarized in Table 2.

A.2 Observational Study

In this appendix, we apply our results to data from the INTERHEART study which was conducted by Yusuf et al. [33] and reanalyzed by Walter [34]. The aim of this observational study was to evaluate the effect of potentially modifiable risk factors on acute myocardial infarction in 52 countries. In this study, based on the population attributable risks (PAR), various risk factors were studied, including smoking, history of hypertension or diabetes, waist/hip ratio, dietary patterns, physical activity, consumption of alcohol, blood apolipoproteins, and psychosocial factors to acute myocardial infarction. For details, refer to Yusuf et al. [33] and Walter [34]. Following Walter [34], we use data on smoking status from 10 world regions, shown in Table 3, and evaluate proposed measures.

Table 3

Data from the INTERHEART study [33, 34].

World regionsrCasesrControls
rEver smokersrNever smokersrEver smokersrNever smokers
Western Europe447205412339
Central/Eastern Europe11026011007914
Middle East/Egypt1096502818920
Africa408156436339
South Asia9866168591249
China/Hong Kong1749126612321816
South-East Asia/Japan658275685514
Australia447141370309
South America/Mexico800391909968
North America21770218121

First, we assume that the world region (Z) is a sufficient confounder between smoking status (X) and acute myocardial infarction (Y), that is, Z satisfies the SITA condition, where smoking status consists of “never smoke” (protective exposure; X=x) and “ever (former or current) smoke” (non-protective exposure; X=x). Then, based on the discussion in Section 4, we evaluate CPPp, CPPe, CPaPp and CPaPu of smoking on acute myocardial infarction. Because we assume that the world region (Z) is a sufficient confounder, the causal effects of X=x on Y=y and of X=x on Y=y are given by pr(yx)=0.359 and pr(yx)=0.535 respectively. Thus, the bounds on CPPp, CPPe, CPaPp and CPaPu are given by [0.146,0.443], [0.329,1.000], [0.214,0.652] and [0.329,1.000], respectively. Additionally, when we assume negative monotonicity, we have CPPp=0.146, CPPe=0.329, CPaPp=0.214 and CPaPu=0.329. When negative monotonicity is assumed but Z does not satisfy the SITA condition, the bounds on CPPp, CPPe and CPaPp are given by [0.000,0.382], [0.000,0.639] and [0.000,0.652] respectively, however the bounds on CPaPu are [0.000,1.000]. On the other hand, when Z does not satisfy the SITA condition and negative monotonicity is violated, the bounds on CPPp and CPaPp are given by [0.000,0.486] and [0.000,0.652] respectively, but the bounds on CPPe and CPaPu are [0.000,1.000]. The results are summarized in Table 4.

Table 4

Results from an example of the observational study.

AssumptionsrEvaluations
NegativerCausal effectsrCPPprCPPerCPaPprCPaPu
monotonicityravailability
++0.1460.3290.2140.329
+[0.146,0.443][0.329,1.000][0.214,0.652][0.329,1.000]
+[0.000,0.382][0.000,0.639][0.000,0.652]no information
[0.000,0.486]no information[0.000,0.652]no information

B. Extension to Multi-Categorical Exposure Cases

Here, we consider the case where X is a multi-categorical protective exposure that takes the value x{x0,x1,...,xl} and Y is a dichotomous response variable.

B.1 Counterfactual-Based Prevented Proportions

Regarding CPPe, we extend eq. (8) to

(42)CPPek=pr(y|xk,yx0).

Then, because we have

(43)CPPp=k=0lpr(y|xk,yx0)pr(xk|yx0)=k=0lCPPek×pr(xk|yx0),

under exogeneity, we derive

(44)CPPp=k=0lpr(y|xk,yx0)pr(xk|yx0)=k=0lCPPek×pr(xk)
(45)CPPek=pr(yxk|yx0).

Additionally, under the negative monotonicity assumption, that is, pr(yx,yx0)=0 for x{x1,...,xl}, because we have

(46)CPPek=pr(yxk|xk)pr(yx0|xk)+pr(yxk,yx0|xk)pr(yx0|xk)=pr(y|xk)pr(yx0|xk)pr(yx0|xk)

for k=1,....,l, we have

(47)CPPp=k=0lpr(y|xk)pr(yx0|xk)pr(yx0)pr(xk).

Under negative monotonicity and exogeneity, we have

(48)CPPp=k=0lpr(y|xk)pr(y|x0)pr(y|x0)pr(xk),

which is also discussed by Miettinen [6] as prevented fraction “for a polytomous indicator of protection.”

Generally, even if a randomized experiment is conducted, eq. (42) is not identifiable without any further assumptions. Thus, to solve the problem, we formulate the closed-form formulas of the bounds on these measures. First, when both pr(yx|x) and pr(x,y) are available without any further assumptions (xx;x,x{x0,...,xl}), because the bounds on CPPek are

(49)max0pr(y|xk)pr(yx0|xk)pr(yx0|xk)CPPekmin1pr(y|xk)pr(yx0|xk),

the bounds on CPPp are given by

(50)k=0lmax0pr(y|xk)pr(yx0|xk)pr(yx0)pr(xk)CPPpk=1lmin1pr(y|xk)pr(yx0)pr(xk).

When only pr(x,y) is available, without any further assumptions, the bounds on CPPp are given by

(51)0CPPppr(y)pr(x0,y)pr(y)pr(x0,y)+pr(x0,y)

because we have

CPPp=k=0lpr(xk,y,yx0)k=0lpr(xk,y,yx0)+k=0lpr(xk,y,yx0)=k=0lpr(yx0|xk,y)pr(xk,y)k=0lpr(yx0|xk,y)pr(xk,y)+k=0lpr(yx0|xk,y)pr(xk,y)k=1lpr(xk,y)k=1lpr(xk,y)+pr(x0,y)=pr(y)pr(x0,y)pr(y)pr(x0,y)+pr(x0,y).

The bounds on CPPek provide no information in the sense that they are given by the range [0.000,1.000].

B.2 Counterfactual-Based Preventable Proportions

Regarding CPaPp and CPaPu, we extend eqs. (13) to

(52)CPaPpk=pr(yxk|y),CPaPuk=pr(yxk|x0,y).

Then, we have

(53)CPaPpk=m=0lpr(yxk|xm,y)pr(xm|y).

Under exogeneity, we derive

(54)CPaPpk=m=0lpr(yxk|yxm)pr(xm|y)
(55)CPaPuk=pr(yxk|yx0).

Additionally, under the assumption of pr(yx,yx0)=0 for x{x1,...,xl}, when both pr(yx|x) and pr(x,y) are available (xx;x,x{x0,...,xl}),CPaPuk is identifiable and is given by

(56)CPaPuk=pr(yxk|x0)pr(yx0|x0)+pr(yxk,yx0|x0)pr(y|x0)=pr(yxk|x0)pr(y|x0)pr(y|x0).

However, CPaPpk is not identifiable from such an assumption. When we assume pr(yxk,yx)=0 for x{x0,...,xl}, we have

(57)CPaPpk=m=0lpr(yxk|xm)pr(y|xm)pr(y|xm)pr(xm|y).

When both pr(yx|x) and pr(x,y) are available without any further assumptions (xx;x,x{x0,...,xl}), the bounds on CPaPuk are

(58)max0pr(yxk|x0)pr(y|x0)pr(y|x0)CPaPukmin1pr(yxk|x0)pr(y|x0).

Thus, we have

(59)m=0,mklmax0pr(yxk|xm)pr(y|xm)pr(y|xm)pr(xm|y)CPaPpkm=0,mklmin1pr(yxk|xm)pr(y|xm)pr(xm|y).

When only pr(x,y) is available, without any further assumptions, the bounds on CPaPpk are given by

(60)0CPaPpk1pr(xk|y).

The bounds on CPaPuk provide no information in the sense that they are given by the range [0.000,1.000].

Acknowledgements:

We would like to thank two anonymous reviewers whose comments significantly improved the presentation of this paper. This work was partially supported by Japan Society for the Promotion of Science (JSPS), Grant Number 15K00060.

References

1. Landrioan PJ. Epidemic measles in a divided city. J Am Med Assoc 1972;221:567–70.10.1001/jama.1972.03200190013003Search in Google Scholar

2. Gregg MB. Field epidemiology, the 3rd ed. New York: Oxford University Press, 2008.10.1093/acprof:oso/9780195313802.001.0001Search in Google Scholar

3. Centers for Disease Control and Prevention. Varicella outbreak among vaccinated children – Nebraska, 2004. Morbidity and Mortality Weekly Report 2006;55:749–52.Search in Google Scholar

4. Greenwood M, Yule GU. The statistics of anti-typhoid and anti-cholera inoculations, and the interpretation of such statistics in general. Proc Royal Soc Med 1915;8:113–94.10.1177/003591571500801433Search in Google Scholar

5. Weinberg GA, Szilagyi PG. Vaccine epidemiology: efficacy, effectiveness, and the translational research roadmap. J Infect Dis 2010;201:1607–10.10.1086/652404Search in Google Scholar PubMed

6. Miettinen OS. Proportion of disease caused or prevented by a given exposure, trait or intervention. Am J Epidemiol 1974;99:325–32.10.1093/oxfordjournals.aje.a121617Search in Google Scholar PubMed

7. Gargiullo PM, Rothenberg RB, Wilson HG. Confidence intervals, hypothesis tests, and sample sizes for the prevented fraction in cross-sectional studies. Stat Med 1995;14:51–72.10.1002/sim.4780140107Search in Google Scholar PubMed

8. Benichou J. A review of adjusted estimators of attributable risk. Stat Methods Med Res 2001;10:195–216.10.1177/096228020101000303Search in Google Scholar PubMed

9. Laaksonen M. Population attributable fraction (PAF) in epidemiologic follow-up studies. Nat Inst Health Welfare (THL) 2010;34.Search in Google Scholar

10. Greenland S. Concepts and pitfalls in measuring and interpreting attributable fractions, prevented fractions, and causation probabilities. Ann Epidemiol 2015;25:155–61.10.1016/j.annepidem.2014.11.005Search in Google Scholar PubMed

11. Porta M, Greenland S, Hernan M, Santos Silva I, Last JM. A dictionary of epidemiology, the 6th ed. New York: Oxford University Press, 2014.10.1093/acref/9780199976720.001.0001Search in Google Scholar

12. Boslaugh S. Encyclopedia of epidemiology. Thousand Oaks: Sage Publications, 2007.10.4135/9781412953948Search in Google Scholar

13. Suzuki E, Yamamoto E, Tsuda T. On the relations between excess fraction, attributable fraction, and etiologic fraction. Am J Epidemiol 2012;175:567–75.10.1093/aje/kwr333Search in Google Scholar PubMed

14. Kleinbaum DG, Kupper LL, Morgenstern H. Epidemiologic research: principles and quantitative methods. New York: John Wiley and Sons, 1982.Search in Google Scholar

15. Spasoff RA. Epidemiologic methods for health policy. Oxford: Oxford University Press, 1999.Search in Google Scholar

16. Sackett DL, Haynes RB, Tugwell P, Guyatt GH. Clinical epidemiology: a basic science for clinical medicine, the 2nd ed. Boston: Brown and Company, 1991.Search in Google Scholar

17. Imbens GW, Rubin DB. Causal inference in statistics, social, and biomedical sciences. New York: Cambridge University Press, 2015.10.1017/CBO9781139025751Search in Google Scholar

18. Pearl J. Causality: models, reasoning, and inference, the 2nd edition. New York: Cambridge University Press, 2009.10.1017/CBO9780511803161Search in Google Scholar

19. Robins JM. The analysis of randomized and non-randomized AIDS treatment trials using a new approach to causal inference in longitudinal studies. In: Sechrest L, Freeman H, Mulley A, editor. Health service research methodology: a focus on AIDS. Washington D.C.: U.S. Public Health Service, National Center for Health Services Research, 1989:113–59.Search in Google Scholar

20. Rosenbaum PR, Rubin DB. The central role of propensity score in observational studies for causal effects. Biometrika 1983;70:41–55.10.1093/biomet/70.1.41Search in Google Scholar

21. Tian J, Pearl J. A general identification condition for causal effects. Proc 18th Nat Conf Artif Intell 2002:567–73.Search in Google Scholar

22. Greenland S, Robins JM. Identifiability, exchangeability, and epidemiological confounding. Int J Epidemiol 1986;15:413–9.10.1093/ije/15.3.413Search in Google Scholar PubMed

23. Kuroki M, Cai Z. Statistical analysis of probabilities of causation using covariate information. Scand J Stat 2011;38:564–77.Search in Google Scholar

24. Pearl J. Probabilities of causation: three counterfactual interpretations and their identification. Synthese 1999;121:93–149.10.1023/A:1005233831499Search in Google Scholar

25. Robins JM, Greenland S. The probability of causation under a stochastic model for individual risk. Biometrics 1989;45:1125–38.10.2307/2531765Search in Google Scholar

26. Tian J, Pearl J. Probabilities of causation: bounds and identification. Ann Math Artif Intell 2000;28:287–313.10.1023/A:1018912507879Search in Google Scholar

27. Cai Z, Kuroki M, Pearl J, Tian J. Bounds on direct effects in the presence of confounded intermediate variables. Biometrics 2008;64:695–701.10.1111/j.1541-0420.2007.00949.xSearch in Google Scholar PubMed

28. Cai Z, Kuroki M, Sato T. Non-parametric bounds on treatment effects with non-compliance by covariate adjustment. Stat Med 2007;26:3188–204.10.1002/sim.2766Search in Google Scholar PubMed

29. Jiang Z, Chiba Y, VanderWeele TJ. Monotone confounding, monotone treatment selection and monotone treatment response. J Causal Inference 2014;2:1–12.10.1515/jci-2012-0006Search in Google Scholar PubMed PubMed Central

30. Manski CF, Pepper JV. Monotone instrumental variables: with an application to the returns to schooling. Econometrica 2000;68:997–1010.10.1111/1468-0262.00144Search in Google Scholar

31. Levine MM. Typhoid fever vaccines. Vaccines, the 6th ed, eds: Plotkin SA, Orenstein WA. Philadelphia: W.B. Saunders, 2013:812–836.Search in Google Scholar

32. Yang HH, Wu CG, Xie GZ, Gu QW, Wang BR, Wang LY et al. Efficacy trial of Vi polysaccharide vaccine against typhoid fever in south-western China. Bull World Health Organiz 2001;79:625–31.Search in Google Scholar

33. Yusuf S, Hawken S, Ounpuu S, Dans T, Avezum A, Lanas F et al. Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study. The Lancet 2004;364:937–52.10.1016/S0140-6736(04)17018-9Search in Google Scholar

34. Walter SD. Local estimates of population attributable risk. J Clin Epidemiol 2010;63:85–93.10.1016/j.jclinepi.2009.02.001Search in Google Scholar PubMed

Published Online: 2017-9-6

© 2017 Walter de Gruyter GmbH, Berlin/Boston

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Downloaded on 25.4.2024 from https://www.degruyter.com/document/doi/10.1515/jci-2016-0020/html
Scroll to top button