Abstract
Each cluster consists of multiple subunits from which outcome data are collected. In a subunit randomization trial, subunits are randomized into different intervention arms. Observations from subunits within each cluster tend to be positively correlated due to the shared common frailties, so that the outcome data from a subunit randomization trial have dependency between arms as well as within each arm. For subunit randomization trials with a survival endpoint, few methods have been proposed for sample size calculation showing the clear relationship between the joint survival distribution between subunits and the sample size, especially when the number of subunits from each cluster is variable. In this paper, we propose a closed form sample size formula for weighted rank test to compare the marginal survival distributions between intervention arms under subunit randomization, possibly with variable number of subunits among clusters. We conduct extensive simulations to evaluate the performance of our formula under various design settings, and demonstrate our sample size calculation method with some real clinical trials.
Similar content being viewed by others
References
Aalen O (1978) Nonparametric inference for a family of counting processes. Ann Stat 6:701–726
Batchelor J, Hackett M (1970) Hl-a matching in treatment of burned patients with skin allografts. The Lancet 296(7673):581–583
Fleming TR, Harrington DP (2011) Counting processes and survival analysis, vol 169. Wiley, London
Freedman LS (1982) Tables of the number of patients required in clinical trials using the logrank test. Stat Med 1(2):121–129
Gangnon RE, Kosorok MR (2004) Sample-size formula for clustered survival data using weighted log-rank statistics. Biometrika 91(2):263–275
Gehan EA (1965) A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52(1–2):203–224
George SL, Desu M (1974) Planning the size and duration of a clinical trial studying the time to some critical event. J Chronic Dis 27(1–2):15–24
Gumbel EJ (1960) Bivariate exponential distributions. J Am Stat Assoc 55(292):698–707
Harper CC, Rocca CH, Thompson KM, Morfesis J, Goodman S, Darney PD, Westhoff CL, Speidel JJ (2015) Reductions in pregnancy rates in the USA with long-acting reversible contraception: a cluster randomised trial. The Lancet 386(9993):562–568
Harrington DP, Fleming TR (1982) A class of rank test procedures for censored survival data. Biometrika 69(3):553–566
Jeong JH, Jung SH (2006) Rank tests for clustered survival data when dependent subunits are randomized. Stat Med 25(3):361–373
Jung SH (2008) Sample size calculation for the weighted rank statistics with paired survival data. Stat Med 27(17):3350–3365
Kalbfleisch JD, Prentice RL (2011) The statistical analysis of failure time data, vol 360. Wiley, London
Lakatos E (1988) Sample sizes based on the log-rank statistic in complex clinical trials. Biometrics 44:229–241
Lakatos E, Lan KG (1992) A comparison of sample size methods for the logrank statistic. Stat Med 11(2):179–191
Lee EW, Wei L, Amato DA, Leurgans S (1992) Cox-type regression analysis for large numbers of small groups of correlated failure time observations. In: Klein JP, Goel PK (eds) Survival analysis: state of the art. Springer, pp 237–247
Li J, Jung SH (2020) Sample size calculation for cluster randomization trials with a time-to-event endpoint. Stat Med 39(25):3608–3623
Lin D, Ying Z (1993) A simple nonparametric estimator of the bivariate survival function under univariate censoring. Biometrika 80(3):573–581
Martens MJ, Logan BR (2021) A unified approach to sample size and power determination for testing parameters in generalized linear and time-to-event regression models. Stat Med 40(5):1121–1132
McNeil AJ (2008) Sampling nested archimedean copulas. J Stat Comput Simul 78(6):567–581
Nelson W (1969) Hazard plotting for incomplete failure data. J Qual Technol 1(1):27–52
Nolan J (2003) Stable distributions: models for heavy-tailed data. Birkhauser, New York
Pals SL, Murray DM, Alfano CM, Shadish WR, Hannan PJ, Baker WL (2008) Individually randomized group treatment trials: a critical appraisal of frequently used design and analytic approaches. Am J Public Health 98(8):1418–1424
Schoenfeld DA (1983) Sample-size formula for the proportional-hazards regression model. Biometrics 39:499–503
Tarone RE, Ware J (1977) On distribution-free tests for equality of survival distributions. Biometrika 64(1):156–160
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Limiting distribution of the clustered rank statistic under \(H_1\)
For subunit j in cluster i that is randomized to arm k, let \(M_{ikj}(t)=N_{ikj}(t)-\int _0^t Y_{ikj}(s)d\Lambda _k(t)\) and \(M_{ik}(t)=\sum _{j=1}^{m_{ik}}M_{ikj}(t)\). By the definition of W,
Let \(\tau =\max \{t:S_1(t)S_2(t)G(t)>0\}\). Usually the upper limit of the support of survival distributions is longer than the study period which is the upper limit of the support of censoring distribution, so that \(\tau \) denotes the study period. For the log-rank statistic, as \(n\rightarrow \infty \), \(n^{-1}Y_k(t)\) and H(t) uniformly converge to \(y_k(t)={\bar{m}} p_kS_k(t)G(t)\) and
in \([0,\tau ]\), respectively, so that we have
where \(\epsilon _{i}=\epsilon _{i1} - \epsilon _{i2}\), \(\epsilon _{ik} = \sum _{j=1}^{m_{ik}}\epsilon _{ikj}\), and \(\epsilon _{ikj}=\int _0^\infty y_k(t)^{-1}h(t)dM_{ikj}(t)\).
Since, \(\{\epsilon _{i}, i=1,...,n\}\) are independent random variables with mean 0, by the central limit theorem, W is approximately normal with mean \(\sqrt{n}{\bar{\omega }}\), where \(\omega =\int _0^\infty h(t) \{d\Lambda _1(t)-d\Lambda _2(t)\}\) and variance \(\sigma ^2=\sigma _1+\sigma _2-2\sigma _{12}\) where
and
We can derive \(c_k\) in a rather direct way. For \(j\ne j'\), By definition,
By similar arguments to those in the lemma of Jung (2008), we have
where \(y_k(t_1,t_2) = E(Y_{ikj}Y_{ikj'})=G(t_1,t_2)S_k(t_1,t_2)\). We can also derive
and
Therefore
Similarly we have
where \(y(t_1,t_2) = E(Y_{i1j}Y_{i2j'})=G(t_1,t_2)S_{12}(t_1,t_2)\).
On the other hand, by definition,
By the uniform convergence of \(n^{-1}Y_k(t)\) and \(Y_k(t)^{-1}dN_k(t)\) to \(y_k(t)\) and \(d\Lambda _k(t)\), respectively, \(d{\hat{\Lambda }}(t)\) uniformly converges to \(\{y_1(t)d\Lambda _1(t)+y_2(t)d\Lambda _2(t)\}/\{y_1(t)+y_2(t)\}\) in \([0,\tau ]\). Hence, we have
Here,
are negligible under a nearby alternative hypothesis. Therefore, \({\hat{\sigma }}^2=\frac{1}{n}\sum _{i=1}^n\epsilon _{i}^2 +o_p(1)\) converges to \(\sigma ^2\).
Appendix B: A simplified sample size formula under the nearby alternative hypothesis
We consider a proportional hazards model, \(\Delta =\lambda _1(t)/\lambda _2(t)\), and simplify the sample size formula under the nearby alternative hypothesis. Suppose \(S_1(t_1,t_2)\) and \(S_2(t_1,t_2)\) are commonly approximated by \(S(t_1,t_2)\). Under this assumption, we have \(\log \Delta =\approx \Delta -1\) by the Taylor expansion and
where \(d=-\int _0^\infty G(t)dS(t)=P(T_{ij}<C_{ij})\) denotes the probability that a subunit experiences an event. Furthermore,
and
Let \(c_w= \int _0^\infty \int _0^\infty S(t_1,t_2)G(t_1, t_2)dA(t_1,t_2)\) and \(c_b= \int _0^\infty \int _0^\infty S_{12}(t_1,t_2)G(t_1, t_2)dA_{12}(t_1,t_2)\). Then, we have
where \(\rho _w=c_w/d\) and \(\rho _b=c_b/d\). Hence, under the nearby alternative hypothesis, (4) is expressed as
where \(\text{ DE }=1+(2p_1p_2\bar{\bar{m}}/{\bar{m}} -1)\rho _w - 2p_1p_2\rho _b\bar{\bar{m}}/{\bar{m}}\).
Appendix C: Calculation of parameters under practical settings given in Sect. 3.3
Under the assumption of common censoring within each cluster, we have \(G(t_1,t_2)=G(t_1\vee t_2)\). Further, with uniform accrual during accrual period a and with additional follow-up period b, we have
We assume Gumbel’s copula and the exponential marginal distribution with hazard rate \(\lambda _k\). Using the same notation as in Sect. 3.3, the within-treatment group joint distribution becomes,
Hence, we have
Similarly for the inter-arm distributions, we have
In addition, using the formulas given in Sect. 3.1, we have
Appendix D: Relationship between sample sizes of cluster randomization study and subunit randomization study
For CRTs with time-to-event endpoint, Li and Jung (2020) proposed that the required total number of clusters \(n_c\) can be calculated with
Subunit randomization and cluster randomization are equivalent in some special cases. First, for a equally allocated SRT with sample size \(n_s\) and mean cluster size \({\bar{m}}\), if the inter-treatment ICC \(\rho _b = 0\), it is equivalent to a equally allocated CRT with a total of \(2n_s\) clusters and mean cluster size \({\bar{m}}/2\). Since \(E\{(m_i/2)^2\} = E(m_i^2)/4 = \bar{\bar{m}}/4\), this indicates that
In addition, for equally allocated CRTs, we have
The last inequality is based on the previous equation and the fact that \(n_s(\rho _w, \rho _b,{\bar{m}}, \bar{\bar{m}},p_1)\le n_s(\rho _w, 0,{\bar{m}}, \bar{\bar{m}},p_1)\) always holds.
Rights and permissions
About this article
Cite this article
Li, J., Jung, SH. Sample size calculation for clustered survival data under subunit randomization. Lifetime Data Anal 28, 40–67 (2022). https://doi.org/10.1007/s10985-021-09538-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-021-09538-0