Erratum to: Hennessy J, Dasgupta T, Miratrix L, Pattanayak C, Sarkar P. A conditional randomization test to account for covariate imbalance in randomized experiments. J Causal Inference 2016;4(1):61–80 (https://doi.org/10.1515/jci-2015-0018).
There was an error in [1] and we are very grateful to Peng Ding for pointing it out.
Proposition 1, restated below, is incorrect, our proof being a mis-application of a result from [2].
Proposition 1. Let X denote a categorical covariate with J levels, observed after a two-armed randomized experiment is conducted with N units. Let Nj denote the observed number of units that belong to stratum j, and let NTj and NCj denote the number of units assigned to treatment and control respectively, in stratum j, such that NTj+NCj=Nj, and ∑j=1JNj=N. Then the conditional randomization test using the simple difference test statistic τˆsd=YˉTobs−YˉCobs and the balance function (NT1, ..., NTJ) is equivalent to the conditional randomization test using the composite test statistic τˆps=∑j=1JNjNτˆsd,j, where τˆsd,j denotes the simple difference test statistic for the jth stratum.
We can show the proposition is not true by a simple counterexample. In order for the two conditional tests to be equivalent, they must yield the same p-values. Consider the situation where N=5, X=(1,1,1,2,2), w=(1,0,0,1,0), and yobs=(1.13,0.49,−0.31,0.98,1.68). In this case, τˆsd=0.435 and τˆps=0.344. To find the p-values for the conditional test, we consider the values of the test statistics across all 6 alternative randomizations where NT1=1 and NT2=1.
To calculate the p-values, we find the proportion of test statistics as or more extreme than the observed. For τˆsd=0.435, there are three test statistics as large or larger (0.435, 0.485, 1.018), so the 2-sided p-value is 2⋅3/6=1. For τˆps=0.344, there are two test statistics as large or larger (0.344, 0.904), so the 2-sided p-value is 2⋅2/6=2/3. Since the p-values do not agree, the tests are not equivalent.
The incorrect proof in Appendix A mis-applied a result from [2] that showed that in a linear regression of the response on the treatment indicator and covariates, a conditional randomization test based on the treatment indicator coefficient is equivalent to the conditional randomization test based on the simple difference test statistic. This proof rests on the fact that the columns corresponding to the covariates are fixed across randomizations. While our τˆps does equal a regression coefficient, that regression includes interactions between the treatment indicator and the covariates. However, these interaction terms are not fixed across the different randomizations and we ignored this fact in the proof. In the proof in Appendix A, we incorrectly assumed k1=wTF(FTF)−1FTyobs is a constant. While wTF(FTF)−1 is a constant, FTyobs is not.Note that the main results and conclusions regarding conditional randomization tests from [1] do not depend on the proposition. The proposition was only used in the simulation study to reduce the number of tests to be compared. Rather than reporting the conditional tests using both τˆsd and τˆps, we only reported results using τˆsd. However, in the specific simulation setting we explored, τˆps is, in fact, a monotonic function of τˆsd when conditioning on the observed balance because there are two strata of equal size and the treated and control groups are of equal size. In this situation, the conditional tests using τˆsd and τˆps are equivalent. We verify this fact below. For this situation, our test statistics can be expanded as
Table 1
Alternative randomizations: For each alternative randomization where NT1=1 and NT2=1, we calculate both test statistics.
Randomization | τˆsd | τˆps |
(1, 0, 0, 1, 0) | 0.435 | 0.344 |
(0, 1, 0, 1, 0) | −0.098 | −0.232 |
(0, 0, 1, 1, 0) | −0.765 | −0.952 |
(1, 0, 0, 0, 1) | 1.018 | 0.904 |
(0, 1, 0, 0, 1) | 0.485 | 0.328 |
(0, 0, 1, 0, 1) | −0.182 | −0.392 |
τˆsd=(NT1NTYˉT1obs+NT2NTYˉT2obs)−(NC1NCYˉC1obs+NC2NCYˉC2obs)=2N(NT1YˉT1obs+NT2YˉT2obs−NC1YˉC1obs−NC2YˉC2obs)
and
τˆps=N1N(YˉT1obs−YˉC1obs)+N2N(YˉT2obs−YˉC2obs)=12(YˉT1obs+YˉT2obs−YˉC1obs−YˉC2obs).
We then show that τˆps is a monotonic function of τˆsd by showing that
τˆps=N4NT1NC1(N4(τˆsd+2Yˉobs)−NT1Yˉ1obs−NT2Yˉ2obs).
The proof is available upon request.
If we were to include the conditional randomization test using τˆps in the simulation study, the results would be the same as those reported for the conditional randomization test using τˆsd. We leave a formal comparison of conditional randomization tests using τˆsd and τˆps for future work.