Targeted design for adaptive clinical trials via semiparametric model

Hongbin Zhang; Ao Yuan; Ming T. Tan

doi:10.1515/ijb-2018-0100

Publicly Available Published by De Gruyter October 8, 2020

Targeted design for adaptive clinical trials via semiparametric model

Hongbin Zhang , Ao Yuan and Ming T. Tan

From the journal The International Journal of Biostatistics

https://doi.org/10.1515/ijb-2018-0100

Abstract

Precision medicine approach that assigns treatment according to an individual’s personal (including molecular) profile is revolutionizing health care. Existing statistical methods for clinical trial design typically assume a known model to estimate characteristics of treatment outcomes, which may yield biased results if the true model deviates far from the assumed one. This article aims to achieve model robustness in a phase II multi-stage adaptive clinical trial design. We propose and study a semiparametric regression mixture model in which the mixing proportions are specified according to the subjects’ profiles, and each sub-group distribution is only assumed to be unimodal for robustness. The regression parameters and the error density functions are estimated by semiparametric maximum likelihood and isotonic regression estimators. The asymptotic properties of the estimates are studied. Simulation studies are conducted to evaluate the performance of the method after a real data analysis.

Keywords: adaptive clinical trial; mixture model; semiparametric maximum likelihood estimation; subgroup analysis; targeted design

1 Introduction

Precision medicine, in which the treatment regimen is assigned to each patient according to that patient’s personal profile, is revolutionizing health care. The targeted phase II clinical trial design is the initial assessment of efficacy or screening for promising treatments. Such clinical trials use the patients’ personal profiles, such as biomarkers, to identify and help patients likely to benefit significantly from the treatments. In contrast to conventional clinical trials, targeted designs are aimed at identifying sub-population of patients with specific profiles for which the treatments are particularly effective. Such a design can improve the efficiency of a randomized clinical trial [1], [2]. Multi-stage design, also called group sequential design, is commonly used in phase II clinical trials to monitor the trial process, and the trial can be stopped early if extreme response effect or the lack of it thereof is observed at the intermediate stage [3], [4], [5]. Zhou et al. [6] and Lee et al. [7] proposed Bayesian methods for such design; Tang and Zhou [8] considered using biomarkers and optimal allocation for such design; and Gao [9] proposed multi-stage adaptive biomarker-directed clinical design.

Here, the goal is to find possible favorable subgroups for each treatment based on the subjects’ covariate profiles, and the two subgroups for treatments 1 and 2 may not be the same. This problem has some connection but differs from subgroup analysis, in which the goal is to identify a subgroup on which one treatment has significantly better effect than the other [10], [11], [12]. Shen and He [13] proposed a logistic-normal mixture model for subgroup analysis.

Since the patients’ personal profiles are used along with their responses in the assessment of treatment outcomes, the parametric linear regression mixture model will be used. Existing methods often assume some known model, for example, a Gaussian linear model, to estimate the parameters. In practice, the specified model is subjective and misspecified, so that the estimated parameters may not be consistent or may have an inflated error. Various methods have been considered to relax this assumption. Some authors considered different or more general parametric error distributions. Hanson and Johnson [14] considered the error from a mixture of Polya trees. Atsedeweyn and Rao [15] considered the error from a generalized new symmetric distribution. The most robust method is to make no assumption on the error distribution. Yuan and De Gooijer [16] considered such a scenario, used a kernel estimator to estimate the error density and construct the estimated likelihood, and then estimated the regression parameters by the MLE under the estimated likelihood. Yao and Zhao [17] considered a two-step method: they first estimate the errors by the difference between response and the covariates at the LSE value, estimate the error density by kernel estimator, and then estimate the regression parameters by the MLE under the estimated likelihood. However, the kernel or other density estimation methods, such as spline, basis method or penalized smoothing method, still have subjective inputs, such as the choices of kernel, bandwidth, basis, number of bases, penalty term, and smoothing parameter. Determining these quantities in many cases is not easy. The nonparametric maximum likelihood estimate (NPMLE) is appealing because it is completely objective, but it does not exist in the general case. Under certain shape constraint(s), such as log-concave, the NPMLE is unique and well-studied [18], [19], and the asymptotic distributions of the NPMLE and its mode are studied by [20]. Application of this method can be found in [21] and others.

The mixture regression model is increasingly utilized in practice, for example, in clustering, classification, personalized precision medicine, subgroup analysis, and survival analysis. To address robustness in a phase II clinical trial, we propose and study a semiparametric regression mixture model in which the mixing proportions are specified according to the subjects’ profiles, and for robustness, each sub-group distribution is assumed to be unimodal with mode at zero. The nonparametric maximum likelihood estimate (NPMLE), in this case as a form of isotonic regression, is used to estimate the error densities along with the MLE of the regression parameters. The asymptotic properties of the estimators are investigated.

The rest of this article is organized as follows. In Section 2, we introduce our methodology. Section 3 presents the asymptotic properties of the proposed method. A real data set is used for illustration in Section 4, followed by simulation studies in Section 5. Concluding remarks are given in Section 6.

2 The method

In the targeted clinical trial design, a patient population with a specific disease etymology is identified by some covariate profile, such as biomarker(s). Before the trial, patients are tested on the specific profile, such that often only profile-positive patients enter the trial. This design is also referred as enrichment design in the sense that a selected population is further studied in the trial. However, there is debate as to whether test-negative patients should also enter the trial and be investigated along with the test-positive patients. In this case, we just record the test result for each patient and register it as one component in the profile. In such phase II trial, each patient is randomized into several treatment arms; here, without loss of generality, we consider the case of two arms. In each treatment group, there may be a sub-population of patients whom respond to the treatment significantly better than the others. Below, we describe the proposed model for the problem and its estimation. Then, we describe the specifics of the multi-stage design trial.

2.1 The model and parameter estimation

Consider a clinical trial with two treatments (we arbitrarily code them as treatments 1 and 2) and k stages (typically 2 ≤ k ≤ 5). At each intermediate stage j, we observe D n j = { ( y i , x i , t i ) : i = 1 , … , n j } from n _j independent subjects; they are independent copies of (y, x , t), where y _i is the response of the ith subject, x_i is the corresponding covariate profile, including a 1 for the intercept, and t _i is the treatment indicator (t _i = 1 or 2 respectively for treatment 1 or 2). Denote n 1 j = ∑ i = 1 n j ( 2 − t i ) as the number of individuals assigned to treatment 1 at stage j; similarly, n 2 j = ∑ j = 1 n j ( t i − 1 ) is the number of individuals assigned to treatment 2 at stage j. We adopt the common notation for sample sizes; the next stage sample size includes the current one, thus n r , j + 1 > n r , j for j = 1, … , k−1 (r = 1, 2). The relationship between the response and the covariates can be specified as

(1) y i = β ′ x i + ( 2 − t i ) ( μ 1 δ 1 i + ϵ 1 i ) + ( t i − 1 ) ( μ 2 δ 2 i + ϵ 2 i ) ,

where β ∈ R d is the regression parameters and explains the linear relationship between the response y and covariates x, μ _r is the extra effect of treatment r for the treatment-favorable subgroup, and δ r i is the latent membership indicator as to whether the ith subject should be in this subgroup or not, i.e., δ r i = 1 if the ith subject in treatment r has an enhanced effect, and δ r i = 0 otherwise. Here effect stands for response value, typically bigger response value corresponds to better treatment effect. The ϵ r i ’s are iid errors unexplained by the linear relationship.

Note that in the above we have two sets of δ r i ’s (i = 1, … , n _j; r = 1, 2), that manifest the favorable sub-group membership for each of the two treatments. The two subgroups may not be the same. This is different from the subgroup analysis in which only one set of δ i is needed for the subgroup on which one of the treatment has significantly better effect than the other. Also, for our problem, one may model the above problem separately for each treatment, as in sub-group analysis with one treatment. However, a joint analysis as in our setup is more efficient for the common regression coefficient β .

To reflect the fact that the membership is dependent on the covariate, for r = 1, 2, we assume conditioning on x, for some α _r to be estimated,

(2) Δ r ( x ) = Δ r ( x | α r ) = E [ δ r i | x , α r ] = P ( δ r i = 1 | x , α r ) = exp { α r ′ x } 1 + exp { α r ′ x } .

We may use different regression coefficient β ’s and/or different densities to model the error distributions for different treatment and subgroups. However, in a phase II clinical trial, the sample size is typically not large, which will make the simultaneous estimation of these parameters difficult, so we assume the same β for both treatments, and they are of the same density form but with different locations μ _r. Denote μ = ( μ 1 , μ 2 ) ′ , θ = ( α 1 ′ , α 2 ′ , β ′ , μ ′ ) ′ the vector of all the Euclidean parameters, unknown and to be estimated.

Conditioning on x, we specify the density of ϵ z (r = 1, 2) as the following mixture:

(3) g r ( ϵ r | θ , f , x ) = Δ r ( x ) f ( y − β ′ x − μ r ) + ( 1 − Δ r ( x ) ) f ( y − β ′ x ) , ( θ , f ) ∈ ( Θ , ℱ ) ,

where, for each r, Δ_r(x) and 1−Δ_r(x) are the conditional mixing proportions for treatment r patients in the treatment-favorable or non-favorable groups, f is the common density function for responses. We define Θ and ℱ as

Θ = { θ : μ r > 0 , r = 1 , 2 } and ℱ = { f ( ⋅ ) : f is of unimodal density with mode at 0. }

In practice, any subjectively specified model deviates more or less from the underlying data model, and may result in biased inference results. To address this issue, numerous researchers have proposed various methods, as noted in the Introduction. Let us study now how the model we have started to build can offer new solutions.

First it is known that a mixture of nonparametric densities in the general case is not identifiable. Borders et al. [22] showed that if the component densities are the same and symmetric around 0, plus a few conditions, then the nonparametric mixture is identifiable. Hohmann and Holzmann [23] studied the identifiability of the nonparametric mixture model in the more general form below

F ( y | x ) = ( 1 − λ ( x ) ) F 1 ( y ) + λ ( x ) F 2 ( y ) .

They showed that the above model is non-parametrically identifiable if

lim y → − ∞ F 2 ( y ) / F 1 ( y ) = 0 and lim y → ∞ ( 1 − F 1 ( y ) ) / ( 1 − F 2 ( y ) ) = 0.

By L’Hospital’s rule, in terms of density functions f ₁ and f ₀, the above is equivalent to

lim y → − ∞ f 2 ( y ) / f 1 ( y ) = 0 and lim y → ∞ f 1 ( y ) / f 2 ( y ) = 0 .

The above is satisfied for most commonly used distributions, such as normal distributions with different means.

Our model (3) is a special case of that in [23], and the identifiability condition can be met.

Remark 1:

Here, we do not require E ( ϵ r i ) = 0 . Our interest lies in the regression coefficients β , the treatment effects difference μ _r and the density f(⋅). The actual treatment effects are not specified in model (3), as they are confounded with the mode or mean of f(⋅). The mode can also be estimated by its MLE, but it has a slow convergence rate of n 1 / 5 [20], which is the best rate for mode estimate in the sense of [23]. The mean of f(⋅) should also be estimable, and then the actual effects of the treatments can be recovered together with estimates of the μ _r’s.

Although our method applies to multi-stage clinical trials, here we focus on model parameter estimation; without confusion, from now on we just write n for n _j to give a general description. The log-likelihood of the observed data D _nj at end of stage j is

ℓ ( θ , f | D n ) = ∑ i = 1 n [ ( 2 − t i ) log ( Δ 1 ( x i | α 1 ) f ( y i − β ′ x i − μ 1 ) + ( 1 − Δ 1 ( x i | α 1 ) ) f ( y i − β ′ x i ) ) + ( t i − 1 ) log ( Δ 2 ( x i | α 2 ) f ( y i − β ′ x i − μ 2 ) + ( 1 − Δ 2 ( x i | α 2 ) ) f ( y i − β ′ x i ) ) ] .

Suppose that the law of the observed data falls in our model and is identified by the “true” parameter ( θ ₀, f ₀). We estimate ( θ ₀, f ₀) by the semiparametric MLE ( θ ˆ n , f ˆ n ) ,

(4) ( θ ˆ n , f ˆ n ) = arg max ( θ , f ) ∈ ( Θ , ℱ ) ℓ ( θ , f | D n ) .

Generally, for the density function, a nonparametric maximum likelihood estimate (NPMLE) does not exist, but for log-concave density, the NPMLE does exist and is a well-studied topic, so here, f ˆ n exists uniquely. However, it is known that parameter estimation under the additive mixture model is difficult, and it is more convenient to work with the geometric mixture model under the “complete” data D n c = { ( y i , x i , t i , δ t i , i ) : r = 1,2 ; i = 1 , … , n } , with δ r i = 1 if the ith subject is from treatment r’s favorable subgroup; otherwise, δ r i = 0 . The δ r i s are treated as missing data. Under D c = ( y , x , t , δ t ) , given t = r and covariate x, the conditional density is

g r ( ϵ r | x ) = ( Δ r ( x | α r ) f ( y − β ′ x − μ r ) ) δ r ( ( 1 − Δ r ( x | α r ) ) f ( y − β ′ x ) ) 1 − δ r , ( θ , f ) ∈ ( Θ , ℱ ) .

Conditioning on X _n, the log-likelihood under D n c is

(5) ℓ ( θ , f | D n c ) = ∑ i = 1 n { ( 2 − t i ) [ δ 1 i ( log f ( y i − β ′ x i − μ 1 ) + log Δ 1 ( x i | α 1 ) ) + ( 1 − δ 1 i ) ( log f ( y i − β ′ x i ) + log ( 1 − Δ 1 ( x i | α 1 ) ) ) ] ( t i − 1 ) [ δ 2 i ( log f ( y i − β ′ x i − μ 2 ) + logΔ 2 ( x i | α 2 ) ) + ( 1 − δ 2 i ) ( log f ( y i − β ′ x i ) + log ( 1 − Δ 2 ( x i | α 2 ) ) ) ] }

Computationally, the EM algorithm [24] will be used to handle the missing δ r i ’s. For this, given starting values ( θ ( 0 ) , f ( 0 ) ) , compute ( θ ( 1 ) , f ( 1 ) ) , … , given ( θ ( k ) , f ( k ) ) obtain the next iteration estimates ( θ ( k + 1 ) , f ( k + 1 ) ) , … , until convergence of the sequence. The general procedure is performed by the E-step and M-steps, as below.

E-step. Denote δ = { δ t i , i : i = 1 , … , n } . Let

Q ( θ , f | θ ( k ) , f ( k ) ) = E δ [ ℓ ( θ , f | D n c ) | θ ( k ) , f ( k ) , D n ] ,

where the expectation is with respect to the missing data δ r i under the current estimated parameter values ( θ ( k ) , f ( k ) ) and the observed data D _n. We obtain

Q ( θ , f | θ ( k ) , f ( k ) ) = ∑ i = 1 n { ( 2 − t i ) [ λ 1 ( k ) ( x i ) ( log f ( y i − β ′ x i − μ 1 ) + log Δ 1 ( x i | α 1 ) ) + ( 1 − λ 1 ( k ) ( x i ) ) ( log f ( y i − β ′ x i ) + log ( 1 − Δ 1 ( x i | α 1 ) ) ] + ( t i − 1 ) [ λ 2 ( k ) ( x i ) ( log f ( y i − β ′ x i − μ 2 ) + logΔ 2 ( x i | α 2 ) ) + ( 1 − λ 2 ( k ) ( x i ) ) ( log f ( y i − β ′ x i ) + log ( 1 − Δ 2 ( x i | α 1 ) ) ] }

where, for (y _i, x_i, t _i) with t _i = r (r = 1, 2),

λ r ( k ) ( x i ) = E ( δ r i | y i , x i , θ ( k ) , f ( k ) ) = P ( δ r i = 1 | y i , x i , θ ( k ) , f ( k ) )

= Δ r ( x i | α r ( k ) ) f ( k ) ( y i − β ( k ) ′ x i − μ r ( k ) ) Δ r ( x i | α r ( k ) ) f ( k ) ( y i − β ( k ) ′ x i − μ r ( k ) ) + ( 1 − Δ r ( x i | α r ( k ) ) ) f ( k ) ( y i − β ( k ) ′ x i ) .

M-step. Let

( θ ( k + 1 ) , f ( k + 1 ) ) = arg max ( θ , f ) ∈ ( Θ , ℱ ) Q ( θ , f | θ ( k ) , f ( k ) ) .

In particular, with I ( ⋅ ) being the indicator function,

α r ( k + 1 ) = arg max α r ∑ i = 1 n I ( t i = r ) ( λ r ( k ) ( x i ) logΔ r ( x i | α r ) + ( 1 − λ r ( k ) ( x i ) ) log ( 1 − Δ r ( x i | α r ) ) ) ,

( β ( k + 1 ) , μ ( k + 1 ) , f ( k + 1 ) ) = arg max ( β , μ ) ∈ Θ , f ∈ ℱ ∑ i = 1 n [ ( 2 − t i ) ( λ 1 ( k ) ( x i ) log f ( y i − β ′ x i − μ 1 ) + ( 1 − λ 1 ( k ) ( x i ) ) log f ( y i − β ′ x i ) ) + ( t i − 1 ) ( λ 2 ( k ) ( x i ) log f ( y i − β ′ x i − μ 2 ) + ( 1 − λ 2 ( k ) ( x i ) ) log f ( y i − β ′ x i ) ) ] .

Computation of f ^(k+1). The computation of f ^(r+1) is non-trivial; we describe the procedure below. For fixed θ ^(k), set ϵ i 0 ( k ) = y i − β ( k ) ′ x i and ϵ i 1 ( k ) = y i − β ( k ) ′ x i − μ t i ( k ) , and write

(6) f ( k + 1 ) ( ⋅ ) = arg max f ∈ ℱ ∑ i = 1 n [ ( 2 − t i ) ( λ 1 ( k ) ( x i ) log f ( ϵ i 1 ( k ) ) + ( 1 − λ 1 ( k ) ( x i ) ) log f ( ϵ i 0 ( k ) ) ) + ( t i − 1 ) ( λ 2 ( k ) ( x i ) log f ( ϵ i 2 ( k ) ) + ( 1 − λ 2 ( k ) ( x i ) ) log f ( ϵ i 0 ( k ) ) ) ] .

To simplify notation, we just write ( ϵ i j ( k ) , λ r ( k ) ( x i ) , f ( k + 1 ) ) as ( ϵ i j , λ r i , f ˆ ) . In particular, the computation of f ˆ ( ⋅ ) is as follows. Arrange the ϵ i j s in increasing order and denote them as { ϵ i : i = 1 , … , N = 2 n } , λ i , as the corresponding λ r i corresponds to the original ϵ i 1 , λ i as 1 − λ r i corresponds to the original ϵ i 0 , and denote f i = f ( ϵ i ) . Let m be the integer such that ϵ m < 0 < ϵ m + 1 , w i = ϵ i + 1 − ϵ i ( 1 ≤ i ≤ m − 1 ) , w m = − ϵ m , w m + 1 = ϵ m + 1 , w i = ϵ i − ϵ i − 1 (m+2 ≤ i ≤ N). Because f ˆ ( ⋅ ) is a step function and takes the value zero on ( − ∞ , ϵ 1 ] ∪ [ ϵ N , ∞ ) , the constraint ∫ f ( t ) d t = 1 is written as

∑ j = 1 m − 1 f j ( ϵ j + 1 − ϵ j ) + f m ( 0 − ϵ m ) + f m + 1 ( ϵ m + 1 − 0 ) + ∑ j = m + 2 N f j ( ϵ j − ϵ j − 1 ) = 1 .

Let g i = λ i / ( N w i ) , then as in [25],

(7) f ˆ = arg min { f i } ∈ ℱ ∑ i = 1 N w i ( g i − f i ) 2 .

The above is the standard form of isotonic regression, and f ˆ can be computed using the pool adjacent violators algorithm (PAVA; see, for example, [26], [27] and the R-package ufit), a convenient computational tool for performing order restricted maximization or minimization.

It is known that there is no guarantee that the EM algorithm will converge to the MLE, and generally it converges to a local maxima [28]. If the underlying model has the property of being concave, then the EM algorithm converges to the global MLE. Thus, in application, one needs to apply the EM algorithm with different starting values to obtain possible different local stationary points and compare the log-likelihood at these stationary points to find at least a better local maxima.

Our algorithm is a semiparametric version of the EM algorithm; see also [29] for bio-medical applications of this algorithm. The semiparametric and nonparametric EM algorithms were used in a large number of studies, such as in [30], [31], [32]; Chen et al. [33] applied the EM algorithm to a semiparametric random effects model, and Bordes et al. [34] applied the EM algorithm to a semiparametric mixture model, using simulation studies to justify the convergence of the algorithm. Balan and Putter [35] developed an R-package for the EM algorithm for semiparametric shared frailty models.

EM algorithm under μ 1 = μ 2 . One of our goals is to test the null hypothesis μ 1 = μ 2 : = μ . In this case, the EM algorithm needs to be adapted. Similar to the previous case, here with θ = ( α ′ 1 , α ′ 2 , β ′ , μ ) ′ and θ ˆ , in the E-step, we have

Q ( θ , f | θ ( k ) , f ( k ) ) = ∑ i = 1 n { ( 2 − t i ) [ λ 1 ( k ) ( x i ) ( log f ( y i − β ′ x i − μ ) + logΔ 1 ( x i | α 1 ) ) + ( 1 − λ 1 ( k ) ( x i ) ) ( log f ( y i − β ′ x i ) + log ( 1 − Δ 1 ( x i | α 1 ) ) ] + ( t i − 1 ) [ λ 2 ( k ) ( x i ) ( log f ( y i − β ′ x i − μ ) + logΔ 2 ( x i | α 2 ) ) + ( 1 − λ 2 ( k ) ( x i ) ) ( log f ( y i − β ′ x i ) + log ( 1 − Δ 2 ( x i | α 1 ) ) ] } }

where, for (y _i, x_i, t _i) with t _i = r (r = 1, 2),

λ r ( k ) ( x i ) = Δ r ( x i | α r ( k ) ) f ( k ) ( y i − β ( k ) ′ x i − μ ( k ) ) Δ r ( x i | α r ( k ) ) f ( k ) ( y i − β ( k ) ′ x i − μ ( k ) ) + ( 1 − Δ r ( x i | α r ( k ) ) ) f ( k ) ( y i − β ( k ) ′ x i ) .

The M-step is carried out the same way as before with the current definition of θ and θ ˆ . In particular,

( β ( k + 1 ) , μ ( k + 1 ) , f ( k + 1 ) ) = arg max ( β , μ ) ∈ Θ , f ∈ ℱ ∑ i = 1 n [ ( 2 − t i ) ( λ 1 ( k ) ( x i ) log f ( y i − β ′ x i − μ ) + ( 1 − λ 1 ( k ) ( x i ) ) log f ( y i − β ′ x i ) ) + ( t i − 1 ) ( λ 2 ( k ) ( x i ) log f ( y i − β ′ x i − μ ) + ( 1 − λ 2 ( k ) ( x i ) ) log f ( y i − β ′ x i ) ) ] .

The remaining computations are the same as before.

2.2 The multi-stage adaptive targeted trial

At the end of each interim stage, the model parameter estimates are updated based on all the current data, and analysis is performed to check (1) whether the treatment benefit for the biomarker-positive subgroup in the ongoing trial is comparable with the study expectations and (2) whether the difference between the two treatments in the effect-positive subgroup exceeds the early-stopping thresholds. The two objectives are described below.

2.2.1 Test subgroup effect

Under model (1), profile (biomarker) performance can be evaluated by the proportion of treatment-favorable subgroups among all patients. At stage j, for treatment r, let λ ‾ r = n r j − 1 ∑ i = 1 n r j λ r ( x i | α ˆ r ) . If λ ‾ r ≥ λ 0 (r = 1, 2), for some pre-specified λ 0 , then the trial continues to the next stage, otherwise it is stopped.

Another method is hypothesis testing. At stage j, denote ( θ ˆ , f ˆ ) for the estimate at end of this stage, omitting the sample size sub-index for simplicity of notation. Let H _r:μ _r = 0 versus H r c : μ r ≠ 0 (r = 1, 2).

For the parametric model, the commonly used test statistic includes the likelihood ratio statistic, the score test statistic and the Wald statistic. In the parametric case, the three statistics are asymptotically chi-squared distributed and equivalent. However, in semiparametric mixture model, when μ ₁ = 0 or μ ₂ = 0, the model is non-identifiable, although the other parameters are still identifiable and estimable. The situation is similar to that of subgroup analysis, as in [13]. In this case, the likelihood ratio statistic cannot be applied. Therefore, we use the Wald statistic.

For the Wald test in the general case, denote θ = (θ ₁, θ ₂) with dim(θ) = d and dim(θ ₁) = d ₁, and θ ˆ = ( θ ˆ 1 , θ ˆ 2 ) is the MLE of θ under the full model. Consider the null hypothesis H ₀:θ ₁ = θ _1,0. The Wald test statistic is given by

W n = ( θ ˆ 1 − θ 1,0 ) ′ C o v − 1 ( θ ˆ 1 ) ( θ ˆ 1 − θ 1,0 ) .

For our problem, θ 1 = μ r , θ 1,0 = 0 . As noted in Section 3, the asymptotic distribution of θ ˆ is an open question, so for a given level α, we compute the critical value Q 1 ( 1 − α ) by simulation under the null. If W n , r > Q 1 ( 1 − α ) , then H _r is rejected.

For subgroup analysis, since the likelihood ratio test cannot be used, Shen and He [13] artificially select a finite set in the multivariate parametric space and construct a null likelihood ratio at each fixed parameter in the model. Then, they maximize over the set of parameter values as their test statistic. The weak limit of their test statistic is the maximum of a set of chi-square variables, dependent on each other, and has no closed form. Their method has theoretical depth; however, the critical value of their method is also obtained via simulation. More points must be chosen for higher dimensional parametric space, which may be computationally difficult. Additionally, the inference results depend on the choice of the selected fixed parametric points. In contrast, our method is easier to use.

2.2.2 Test treatment difference

For (2), at stage j, denote ( θ ˆ , f ˆ ) for the estimate at end of this stage, omitting the sample size sub-index for simplicity of notation. (2) can be evaluated by testing the null hypothesis H 0 : μ 1 = μ 2 versus H 1 : μ 1 ≠ μ 2 .

Again, we use the Wald test; let W n = ( μ ˆ 1 − μ ˆ 2 ) C o v − 1 ( μ ˆ 1 − μ ˆ 2 ) ( μ ˆ 1 − μ ˆ 2 ) , and compute the critical value Q 2 ( 1 − α ) by simulation under the null. If W n > Q 2 ( 1 − α ) , H 0 is rejected, and the trial continues.

3 Asymptotic results

Recall the definition of g r ( ɛ r | θ , f , x r ) given in (3). Let s = (y, x, t) and γ = P(t = 1). Omitting the density for x, the density-mass function for s is

g ( s | θ , f ) = ( γ g 1 ( y , x | θ , f ) ) 2 − t ( ( 1 − γ ) g 2 ( y , x | θ , f ) ) t − 1 .

Denote ℓ ( s | θ , f ) = log g ( s | θ , f ) , ‖ f − f 0 ‖ = ∫ | f ( t ) − f 0 ( t ) | d t , and let ‖ θ ˆ n − θ 0 ‖ be the Euclidean distance between θ ˆ n and θ 0 . We need the following regularity conditions:

Θ is bounded.
The support of x is bounded.
sup f ∈ ℱ f ( 0 ) < ∞ .
f 0 ∈ ℱ and f 0 ( ⋅ ) is continuous.
ℓ ( s | θ , f ) is Lipschitz in ( θ , f ) .

Theorem 1:

Assume (C1)–(C5); then as n → ∞ ,

‖ θ ˆ n − θ 0 ‖ + | | f ˆ n − f 0 | | → a . s . 0 .

As pointed out in [36] (Section 3.2.2), [37] (Sections 2–3) and [38], the n -consistency of the MLE θ ˆ n in the semiparametric index model is still unknown. The reason is that θ ˆ n is bundled inside f ˆ n ( ⋅ ) , which is a non-smooth step function, and the profile likelihood of θ involves the derivative of f ˆ n ( ⋅ ) . Below, we give the convergence rates of θ ˆ n and f ˆ n .

Theorem 2:

Assume (C1)–(C5); then

| | θ ˆ n − θ 0 | | + | | f ˆ n − f 0 | | = O p ( n − 1 / 3 ) .

Define λ ( x i , η i ) = λ 1 2 − η i ( x i ) λ 2 η i − 1 ( x i ) ,

Λ I = E ( θ 0 , f 0 ) λ ( x , η ) = ( γ E [ λ 1 2 ( x ) ] + ( 1 − γ ) E [ λ 2 2 ( x ) ] ) ,

and Λ I I = E ( θ 0 , f 0 ) λ 2 ( x , η ) = ( γ E [ λ 1 3 ( x ) ] + ( 1 − γ ) E [ λ 2 3 ( x ) ] )

Denote → D for convergence in distribution. Let B ( ⋅ ) be the two-sided Browning motion originating from zero: a mean zero Gaussian process on R with B ( 0 ) = 0 and E ( B ( s ) − B ( h ) ) 2 = | s − h | for all s , h ∈ R .

Let f ˙ 0 ( ⋅ ) be the derivative of f 0 ( ⋅ ) . Assume f ˙ 0 ( t ) ≠ 0 .

Theorem 3:

Assume (C1)–(C6); then

n 1 / 3 ( f ˆ n ( t ) − f 0 ( t ) ) → D ( 4 Λ I I | f ˙ 0 ( t ) | f 0 ( t ) Λ I 2 ) 1 / 3 arg max h ∈ R { B ( h ) − h 2 } .

4 Application to real data

We analyze the real data AIDS Clinical Trials Group (ACTG) 175 with the proposed method. The drugs used in the trial were provided by three manufacturers Glaxo-Wellcome Company (AZT), Bristol-Myers Squibb Inc (ddI), and Hoffmann-La Roche Inc (ddC). The trial was conducted at ACTG’s 43 sites with both adult and pediatric patients and at nine sites of the National Hemophilia Foundation. Participants were enrolled in the study between December 1991 and October 1992 and received treatment through December 1994. Follow-up and final evaluations of participants took place between December 1994 and February 1995.

The purpose of the trial was to evaluate whether treatment of HIV infection with one drug (monotherapy) was the same, better than, or worse than treatment with two drugs (combination therapy) in patients with CD4+ T cells between 200 and 500/mm³. The results of original the ACTG 175 together with the results from earlier studies demonstrate that antiretroviral therapy is beneficial to HIV-infected people who have less than 500 CD4+ T cells/mm³.

We analyze these data using the proposed method on treatment 1 (with AZT) and treatment 4 (with ddI), assuming they construct the first stage of a multi-stage adaptive trial. There were 532 and 561 patients in treatments 1 and 4, respectively. In this analysis, the response variable is the CD4 counts after 20 weeks of the corresponding treatment, square root transformed, and the covariates are baseline age (re-scaled as 10 years incremental) and baseline CD4 counts, also square root transformed.

As a comparison, we also analyze the data using a parametric normal model (NM). For the semiparametric method, we fit the data assuming unimodality (at zero) and symmetric f ( ⋅ ) (SP1) or asymmetric (SP2). In the EM algorithm, we first fit a linear regression on the covariates to obtain the starting values β ( 0 ) . Initial membership (for the treatment-favorable subgroup) classification circles through three magnitudes — 1/2, 1/4 and 1/8 of the standard deviation of the residuals — of the choice of μ r ( 0 ) . Logistic regression is then applied using the candidate membership as a response on the set of covariates (again with the intercept suppressed) to obtain the starting value for α r ( 0 ) . This set of initials, θ ( 0 ) = ( μ r ( 0 ) , α r ( 0 ) , β ( 0 ) ) , are fed into the “NM” model and “SP” models for the EM algorithm to calculate θ ( k ) = ( μ r ( k ) , α r ( k ) , β ( k ) ) . The convergence of the algorithm is accessed by the criterion, ‖ θ ( k + 1 ) − θ ( k ) ‖ ‖ θ ( k ) ‖ ≤ ρ where ρ = 10 − 3 .

The NPMLE under shape constraint(s) is uniquely obtained as a formulation of the isotonic regression problem.

The results of the parameter estimations are presented in Table 1, in which “est” stands for estimate, and “[sd]” stands for estimated standard deviation. We observe that treatment-favorable subgroups are identified based on the profile of two baseline biomarkers, the parametric approach (NM) and the semiparametric approaches (SP1 and SP2). The estimates of β are comparable between the three models, but the NM approach produces different profiles, judging by the λ ‾ r , the mean values of estimated λ _r. Interestingly, between the semiparametric approaches, similar membership profiles are estimated, although the model with the symmetric assumption (SP1) produces slightly larger μ _r than the model where the asymmetric assumption is made (SP2). In general, the SP-based models produce larger μ _r values than the NM method. When a 20% threshold is used for λ ₀, both approaches warrant the hypothetical multi-stage trial proceeding. We tend to favor the results from the SP models because it does not make any distributional assumptions, except unimodality at zero. In the next section, we further study the methods with simulation.

Table 1:

Parameter estimates under three models on ACTG 175 data.

	θ	β	α ₁	α ₂	μ ₁	μ ₂
NM	Est	(0.25, 9.42)	(−0.09, −0.06)	(−0.10, −0.19)	0.27	0.34
NM	[sd]	[0.18, 0.35]	[0.13, 0.26]	[0.21, 0.38]	[0.14]	[0.28]
SP1	Est	(0.23, 9.48)	(−0.02, −0.23)	(−0.25, −0.07)	0.81	0.68
SP1	[sd]	[0.12, 0.27]	[0.13, 0.15]	[0.14, 0.25]	[0.24]	[0.33]
SP2	est	(0.25, 9.50)	(0.03, −0.37)	(−0.23, 0.07)	0.69	0.62
SP2	[sd]	[0.07, 0.45]	[0.21, 0.42]	[0.15, 0.29]	[0.30]	[0.21]

5 Simulation study

Utilizing the data structure of the real trial analyzed above, we simulate n = 1000, with n ₁ = n ₂ = 500, 1-dimensional response y _i and covariates x r i = ( x r i , 1 , x r i , 2 ) . We first generate the covariates, sample the x_ri from the bivariate normal distribution with mean vector as ( 3.52 , 1.85 ) ′ and a given covariance matrix ( 0.75 − 0.01 − 0.01 0.09 ) . We generate the response data where we set the first n ₁ = 500 covariates as being assigned to treatment 1 and the rest to treatment 2. Given the covariates, we have that y r i = β ′ x r i + μ r + ϵ i with probability Δ r ( x r i | α r ) , and y r i = β ′ x r i + ϵ r i with probability 1 − Δ r ( x r i | α r ) . The values β , μ r , α r are taken from the real data analysis where Δ r ( x | α r ) = exp ( α r ′ x ) / ( 1 + exp ( α r ′ x ) ) . For the error term ɛ , we simulate two types of unimodal density: one is based on a Laplace distribution, which is symmetric, and one is based on a skewed normal distribution.

We also compare the performance of the estimates from the parametric model where a normal distribution is assumed for f ( ⋅ ) . We generate 200 simulation runs, and the results are summarized in Tables 2 and 3, where “cr” is the 95% coverage rate for the true parameter. Figure 1 provides a representative model fitting for the density functions during the simulation.

Table 2:

Simulation results for parameter estimates under parametric model (NM) and semiparametric model (SP) (error term from Laplace distribution).

	θ	β	α ₁	α ₂	μ ₁	μ ₂
	θ ₀	(0.23, 9.48)	(−0.02, −0.23)	(−0.25, −0.07)	0.81	0.68
SP	est	(0.22, 9.49)	(0.04, −0.44 )	(−0.05, −0.42)	0.72	0.68
	[sd]	[0.10, 0.33]	[0.20, 0.37]	[0.13, 0.25]	[0.14]	[0.21]
	cr	(0.94, 0.95)	(0.90, 0.91)	(0.90, 0.84)	0.95	0.96
NM	est	(0.24, 9.54)	(0.07, −0.51)	(−0.09, −0.44)	0.53	0.66
	[sd]	[0.20, 0.37]	[0.34, 0.49]	[0.16, 0.30]	[0.31]	[0.32]
	cr	(0.93, 0.93)	(0.87, 0.87)	(0.52, 0.78)	0.87	0.96

Table 3:

Simulation results for parameter estimates under parametric model (NM) and semiparametric model (SP) (error term from skewed normal distribution).

	θ	β	α ₁	α ₂	μ ₁	μ ₂
	θ ₀	(0.25, 9.50)	(0.03, −0.37)	(−0.23, 0.07)	0.68	0.62
SP	est	(0.23, 9.61)	(−0.04, −0.95)	(−0.04, −0.75)	0.62	0.56
	[sd]	[0.15, 0.99]	[0.26, 0.70]	[0.23, 0.59]	[0.33]	[0.28]
	cr	(0.92, 0.93)	(0.89, 0.90)	(0.91, 0.89)	0.94	0.95
NM	est	(0.27, 9.95)	(0.01, −0.49)	(−0.01, −0.39)	0.57	0.55
	[sd]	[0.13, 0.68]	[0.21, 0.67]	[0.16, 0.56]	[0.31]	[0.26]
	cr	(0.93, 0.83)	(0.78, 0.86)	(0.51, 0.70)	0.90	0.84

Figure 1:

Plot of estimated density function from the semiparametric approach (solid blue line) when the true distribution is Laplace (left panel) or skewed normal (right panel). The solid line represents the density of the generating distribution. The dashed line represents a normal curve from the parametric model.

Overall, the models produce the most accurate estimates for the regression coefficients β, followed by the estimates of μ r , and then the α r parameters. As expected, compared to the normal distribution-based model, the semiparametric method produces a much more accurate estimation with coverage rates very close to the nominal level.

6 Concluding remarks

We have proposed and studied a semiparametric model for targeted group sequential clinical trials in which the response-covariates relationship is specified by a linear regression model, and the error distribution is specified as nonparametric for model robustness, with the only assumption being unimodality. The semiparametric maximum likelihood estimate is used for model parameter estimation. Our simulation study shows that the proposed method has a clear advantage over the commonly used normal model, and then the method is used to analyze real clinical trial data. Here we focused on model parameter estimation. As studying the asymptotic distribution of the regression parameter estimation in the single index semiparametric model is still an open question, due to the fact that the shape restricted NPMLE of the nonparametric component is a non-smooth step function, it is challenging to obtain the asymptotic distribution of the corresponding Wald statistic. Thus currently the critical value or/and p-value of the test statistic can only be obtained via simulation. In addition it is known [13] that the likelihood ratio test does not exist for this problem. So the analytic form of the hypothesis test on existence of subgroups is a topic of our further study.

Corresponding author: Ao Yuan and Ming T. Tan, Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University, Washington, DC 20057, USA, E-mail: ay312@georgetown.edu (A. Yuan); mtt34@georgetown.edu (M.T. Tan)

Funding source: City University of New York High-Performance Computing Center

Funding source: National Science Foundation

Award Identifier / Grant number: CNS-0958379

Award Identifier / Grant number: CNS-0855217

Award Identifier / Grant number: ACI-112611

Funding source: City University of New York Research Foundation

Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: This work was partially supported by the City University of New York High-Performance Computing Center, College of Staten Island, funded in part by the City and State of New York, City University of New York Research Foundation and National Science Foundation grants CNS-0958379, CNS-0855217, and ACI-112611.
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

Appendix

Proof of the proposition

Suppose there are f ˜ ∈ ℱ and ( β ˜ , μ ˜ r ) such that

g r ( y | θ ˜ , f ˜ , x ) = Δ r ( x ) f ˜ ( y − β ˜ ′ x ) + ( 1 − Δ r ( x ) ) f ˜ ( y − β ˜ ′ x − μ ˜ r ) ≡ g r ( y | θ , f , x ) = Δ r ( x ) f ( y − β ′ x ) + ( 1 − Δ r ( x ) ) f ( y − β ′ x − μ r ) ,

we need to show that ( f ˜ , β ˜ , μ ˜ r ) = ( f , β , μ r ) .

First, take x = 0; then

g r ( y | θ ˜ , f ˜ , 0 ) = Δ r ( 0 ) f ˜ ( y ) + ( 1 − Δ r ( 0 ) ) f ˜ ( y − μ ˜ r ) ≡ Δ r ( 0 ) f ( y ) + ( 1 − Δ r ( 0 ) ) f ( y − μ r ) = g r ( y | θ , f , 0 ) .

The log-concave density has a unique mode. Let m ₀ and m ˜ 0 be the modes of f ( ⋅ ) and f ˜ ( ⋅ ) ; without loss of generality, we assume m 0 = m ˜ 0 ; otherwise, we can shift f ( ⋅ ) by a constant so that it has the same mode m ˜ 0 , and it is still log-concave, and we still denote it as f ( ⋅ ) .

Because f ˜ , f ∈ ℱ , there are concave functions ϕ ˜ ( ⋅ ) and ϕ ( ⋅ ) such that f ˜ ( ⋅ ) = exp ( ϕ ˜ ( ⋅ ) ) and f ( ⋅ ) = exp ( ϕ ( ⋅ ) ) . If f ˜ ( ⋅ ) ≠ f ( ⋅ ) , then ϕ ˜ ( ⋅ ) ≠ ϕ ( ⋅ ) . Then, there is y ₀ such that ϕ ˜ ( y 0 ) ≠ ϕ ( y 0 ) . Without loss of generality, we assume y ₀ < m ₀,m ₀ to be the common mode of ϕ ( ⋅ ) and ϕ ˜ ( ⋅ ) , and ϕ ˜ ( y 0 ) > ϕ ( y 0 ) . Because both ϕ ˜ ( ⋅ ) and ϕ ( ⋅ ) are concave, we must have that the slope s ˜ ( ⋅ ) of ϕ ˜ ( ⋅ ) and that s ( ⋅ ) of ϕ ( ⋅ ) must satisfy s ˜ ( y ) < s ( y ) for all y < y ₀. Thus, for all y with y < y ₀ and |y| being large, because μ ˜ r > 0 and μ _r > 0, we will have g r ( y | θ ˜ , f ˜ , 0 ) < g r ( y | θ , f , 0 ) , which is a contradiction. Thus, we must have ϕ ˜ ( ⋅ ) ≡ ϕ ( ⋅ ) , and so f ˜ ( ⋅ ) ≡ f ( ⋅ ) .

Because f ( ⋅ ) in model (3) is uniquely determined, it is known that θ in g r ( y | θ , f , x ) is identifiable.

Proof of Theorem 1

Recall the definition of s and g ( s | θ , f ) given before the statement of Theorem 1. Let h ( ⋅ ) be the density function of x , q ( s | θ , f ) = g ( s | θ , f ) h ( x ) , and

be the log-likelihood ratio. Let P be the probability measure of q ( ⋅ | θ 0 , f 0 ) , P _n be the empirical distribution of the data { ( y i , x i , t i ) : i = 1 , … , n . } , and for any probability measure P and function q(s), denote P q = ∫ q ( s ) d P ( s ) .

Note that P m ( s | θ , f ) is the negative Kullback–Leibler divergence between q ( s | θ , f ) and q ( s | θ 0 , f 0 ) , and ( θ 0 , f 0 ) is its unique maximizer by the identifiability of the model; P n m ( s | θ , f ) is the empirical negative Kullback–Leibler divergence, and ( θ ˆ n , f ˆ n ) is the MLE of ( θ 0 , f 0 ) ; thus

( θ 0 , f 0 ) = arg max ( θ , f ) ∈ ( Θ , ℱ ) P m ( s | θ , f ) , and ( θ ˆ n , f ˆ n ) = arg max ( θ , f ) ∈ ( Θ , ℱ ) P n m ( s | θ , f ) .

Let d ( θ , f ; θ 0 , f 0 ) = ‖ θ − θ 0 ‖ + ‖ f − f 0 ‖ . The above with (C4) and the fact that ( θ ˆ n , f ˆ n ) is the MLE imply, for all δ > 0 ,

sup ( θ , f ) ∈ ( Θ , ℱ ) : d ( θ , f ; θ 0 , f ) > δ P m ( s | θ , f ) < P m ( s | θ 0 , f 0 ) , P n m ( s | θ ˆ n , f ˆ n ) ≥ P n m ( s | θ 0 , f 0 ) ,

and so if we show that ℳ = { m ( ⋅ | θ , f ) : ( θ , f ) ∈ ( Θ , ℱ ) } is a P-Glivenko–Cantelli class, then by Theorem 5.8 in [39],

d ( θ ˆ n , f ˆ n ; θ 0 , f 0 ) → a . s . 0 .

The above gives the desired result.

Now we show that ℳ is P-Glivenko–Cantelli. For any function g, let ‖ g ‖ L 1 ( P ) = ∫ | g ( s ) | P ( d s ) , N [ ] ( ϵ , ℳ , L 1 ( P ) ) be the minimum number of ɛ brackets to cover ℳ under norm | | ⋅ | | L 1 ( P ) . We need to evaluate N [ ] ( ϵ , ℳ , L 1 ( P ) ) , ∀ ϵ > 0 .

By (C5), there are 0 < C j < ∞ ( j = 1 , 2 ) , such that for all θ 1 , θ 2 ∈ Θ and f 1 , f 2 ∈ ℱ ,

‖ m ( ⋅ | θ 1 , f 1 ) − m ( ⋅ | θ 2 , f 2 ) ‖ L 1 ( P ) ≤ sup s | m ( s | θ 1 , f 1 ) ) − m ( s | θ 2 , f 2 ) | ≤ C 1 ‖ θ 1 − θ 2 ‖ + C 2 ‖ f 1 − f 2 ‖ .

Consequently,

N [ ] ( ϵ , ℳ , L 1 ( P ) ) ≤ N [ ] ( ϵ 2 C 1 , Θ , ‖ ⋅ ‖ Θ ) × N [ ] ( ϵ 2 C 2 , ℱ , L 1 ( Q ) ) , ∀ ϵ > 0 .

By (C1), N [ ] ( ϵ / ( 2 C 1 ) , Θ , L 1 ( P ) ) = O ( 1 / ϵ r ) , with r = d i m ( Θ ) . Let ℱ − = { f ( ⋅ ) I ( ⋅ ≤ 0 ) : f ∈ ℱ } and ℱ + = { f ( ⋅ ) I ( ⋅ > 0 ) : f ∈ ℱ } . Then, ℱ = ℱ − × ℱ + in the following sense: ∀ f ∈ ℱ , there are f − ∈ ℱ − and f + ∈ ℱ + , such that f ( ⋅ ) = f − ( ⋅ ) + f + ( ⋅ ) . Thus,

N [ ] ( ϵ , ℱ , L r ( Q ) ) ≤ N [ ] ( ϵ 2 , ℱ − , L r ( Q ) ) × N [ ] ( ϵ 2 , ℱ + , L r ( Q ) ) , ∀ ϵ > 0 .

Because ℱ − and ℱ + are collections of uniformly bounded (by (C3)) monotone functions, by Theorem 2.7.5 in [40], for some constant 0 < C < ∞ , for any probability measure Q and any r ≥ 1 ,

max { N [ ] ( ϵ , ℱ − , L r ( Q ) ) , N [ ] ( ϵ , ℱ + , L r ( Q ) ) } ≤ exp { C ϵ } , ∀ ϵ > 0 .

Thus, for some generic constant 0 < C < ∞ ,

N [ ] ( ϵ , ℳ , L 1 ( P ) ) ≤ N [ ] ( ϵ 2 C 1 , Θ , ‖ ⋅ ‖ Θ ) × N [ ] ( ϵ 4 C 2 , ℱ − , | | ⋅ | L r ( Q ) ) × N [ ] ( ϵ 4 C 2 , ℱ + , ‖ ⋅ ‖ L r ( Q ) ) ≤ C ϵ r exp { 2 C ϵ } < ∞ , ∀ ϵ > 0 ,

and so by Theorem 2.4.1 in [40], ℳ is a Glivenko–Cantelli class with respect to P. □

Lemma 1

Assume (C1)–(C5), then

‖ θ ˆ n − θ 0 ‖ + ‖ f ˆ n − f 0 ‖ = O p ( n − 1 / 3 ) .

The proof is similar to that of Lemma 1 in [41], and it is omitted here.

Proof of Theorem 2

The proof is similar to that of Theorem 2 in [41] and is omitted.

In the proof of Theorem 3 below, we need the characterization of f ˆ n using the isotonic technique. Set k → ∞ in (6) of the M-step; assuming convergence of the EM algorithm, we obtain

f ˆ n ( ⋅ ) = arg max f ∈ ℱ ∑ i = 1 n λ 1 2 − t i ( x i ) λ 2 t i − 1 ( x i ) log f ( y i − β ˆ ′ n x i − ( 2 − t i ) μ ˆ 1 − ( t i − 1 ) μ ˆ 2 ) .

Let z i = ( 2 − t i , t i − 1 ) ′ and μ ˆ 1 , n = ( μ ˆ 1 , μ ˆ 2 ) ′ . Denote λ ( x i , η i ) . Then, f ˆ n is re-written as

(A.1) f ˆ n ( ⋅ ) = arg max f ∈ ℱ ∑ i = 1 n λ ( x i , η i ) log f ( y i − β ˆ n ′ x i − μ ˆ 1 , n ′ z i ) .

Proof of Theorem 3

We only prove the result for t ∈ R + ; that for t ∈ R − is parallel. Define the process

S ˆ n ( a ) = arg max s { F n ( s ) − a s } , a ∈ R + .

By Lemma 1 and the argument in Example 3.2.14 in [40],

f ˆ n ( t ) ≤ a if and only if S ˆ n ( a ) ≤ t , for all t , a ∈ R + .

We evaluate the distribution function of n 1 / 3 ( f ˆ n ( t ) − f 0 ( t ) ) . By the above relationship, this distribution function, for all x ∈ R , is given by

P ( n 1 / 3 ( f ˆ n ( t ) − f 0 ( t ) ) ≤ x ) = P ( S ˆ n ( f 0 ( t ) + x n − 1 / 3 ) − t ≤ 0 ) .

As in [42],

S ˆ n ( f 0 ( t ) + x n − 1 / 3 ) − t = n − 1 / 3 arg max h { F n ( t + h n − 1 / 3 ) − ( f 0 ( t ) + x n − 1 / 3 ) ( t + h n − 1 / 3 ) } = n − 1 / 3 arg max h { F n ( t + h n − 1 / 3 ) − f 0 ( t ) h n − 1 / 3 − x h n − 2 / 3 } = n − 1 / 3 arg max h { n 2 / 3 F n ( t + h n − 1 / 3 ) − f 0 ( t ) h n 1 / 3 − x h } = n − 1 / 3 arg max h { n 2 / 3 [ F n ( t + h n − 1 / 3 ) − F n ( t ) ] − f 0 ( t ) h n 1 / 3 − x h } : = n − 1 / 3 arg max h B n ( h ) .

Thus,

P ( n 1 / 3 ( f ˆ n ( t ) − f 0 ( t ) ) ≤ x ) = P ( arg max h B n ( h ) ≤ 0 ) .

Below, we only need to evaluate the asymptotic probability of the event {argmax_h B _n(h) ≤ 0}. The proof is similar to that of Theorem 3 in [41] and is omitted.

References

1. Simon, R, Maitournam, A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clin Canc Res 2004;10:6759–63. https://doi.org/10.1158/1078-0432.CCR-04-0496.Search in Google Scholar PubMed

2. Maitournam, A, Simon, R. On the efficiency of targeted clinical trials. Stat Med 2005;24:329–39. https://doi.org/10.1002/sim.1975.Search in Google Scholar PubMed

3. O’Brien, PC, Fleming, TR. A multiple testing procedure for clinical trials. Biometrics 1979;35:549–56 https://doi.org/10.2307/2530245.Search in Google Scholar

4. Tan, M, Xiong, X, Kutner, MH. Clinical trial designs based on sequential conditional probability ratio tests and reverse stochastic curtailing. Biometrics 1998;54:682–95 https://doi.org/10.2307/3109774.Search in Google Scholar

5. Jennison, C, Turnbull, BW. Group sequential methods with applications to clinical trials. Boca Raton, FL: CRC Press Inc.; 2000.10.1201/9780367805326Search in Google Scholar

6. Zhou, X, Liu, S, Kim, ES, Herbert, RS, Lee, JJ. Bayesian adaptive design for targeted therapy development in lung cancer – a step toward personal medicine. Clin Trials 2008;5:181–93. https://doi.org/10.1177/1740774508091815.Search in Google Scholar PubMed PubMed Central

7. Lee, J, Gu, X, Liu, S. Bayesian adaptive randomized designs for targeted agent development. Clin Trials 2010;7:358–74. https://doi.org/10.1177/1740774510373120.Search in Google Scholar PubMed PubMed Central

8. Tang, L, Zhou, XH. A general framework of marker designs with optimal allocation to assess clinical utility. Stat Med 2013;32:620–30 https://doi.org/10.1002/sim.5531.Search in Google Scholar PubMed

9. Gao, Z, Roy, A, Tan, M. Multistage adaptive biomarker-directed targeted design for randomized clinical trials. Contemp Clin Trials 2015;42:119–31. https://doi.org/10.1016/j.cct.2015.03.001.Search in Google Scholar PubMed

10. Ruberg, SJ, Chen, L, Wang, Y. The mean doesn’t mean as much any more: finding sub-groups for tailored therapeutics. Clin Trials 2010;7:574–83. https://doi.org/10.1177/1740774510369350.Search in Google Scholar PubMed

11. Fan, A, Song, R, Lu, W. Change-plane analysis for subgroup detection and sample size calculation. J Am Stat Assoc 2017;112:769–78. https://doi.org/10.1080/01621459.2016.1166115.Search in Google Scholar PubMed PubMed Central

12. Yuan, A, Chen, X, Zhou, Y, Tan, MT. Subgroup analysis with semiparametric models toward precision medicine. Stat Med 2018;37:1830–45. https://doi.org/10.1002/sim.7638.Search in Google Scholar PubMed

13. Shen, J, He, X. Inference for subgroup analysis with a structured logistic-normal mixture model. J Am Stat Assoc 2015;110:303–12 https://doi.org/10.1080/01621459.2014.894763.Search in Google Scholar

14. Hanson, T, Johnson, WO. Modelling regression error with a mixture of Polya trees. J Am Stat Assoc 2002;97:1020–33 https://doi.org/10.1198/016214502388618843.Search in Google Scholar

15. Atsedeweyn, AA, Rao, KS. Linear regression model with generalized new symmetric error distribution. Math Theor Model 2014;4:48–73. ISSN 2225-0522.Search in Google Scholar

16. Yuan, A, De Gooijer, J. Semiparametric regression with kernel error model. Scand J Stat 2007;34:841–69. https://doi.org/10.2139/ssrn.915340.Search in Google Scholar

17. Yao, W, Zhao, Z. Kernel density-based linear regression estimate. Commun Stat Theor Methods 2013;42:4499–512 https://doi.org/10.1080/03610926.2011.650269.Search in Google Scholar

18. Rufibach, K. Log-concave density estimation and bump hunting for I.I.D. observations. Ph.D. thesis, Univ. Bern and Go¨ttinggen; 2006.Search in Google Scholar

19. Dübgen, L, Rufibach, K. Maximum likelihood estimation of a log-concave density and its distribution function: basic properties and uniform consistency. Bernoulli 2009;15:40–68. https://doi.org/10.3150/08-BEJ141.Search in Google Scholar

20. Balabdaoui, F, Rufibach, K, Wellner, J. Limit distribution theory for maximum likelihood estimation of a log-concave density. Ann Stat 2009;37:1299–331. https://doi.org/10.1214/08-AOS609.Search in Google Scholar PubMed PubMed Central

21. Qin, J, Garcia, TP, Ma, Y, et al. Combining isotonic regression and EM algorithm to predict risk under monotonicity constraint. Ann Appl Stat 2014;8:1182–208 https://doi.org/10.1214/14-AOAS730.Search in Google Scholar PubMed PubMed Central

22. Borders, L, Mottelet, S, Vandekerhove, P. Semiparametric estimation of a two-component mixture model. Ann Stat 2006;34:1204–32 https://doi.org/10.1214/009053606000000353.Search in Google Scholar

23. Hohmann, D, Holzmann, H. Two-component mixtures with independent coordinates as conditional mixtures: nonparametric identification and estimation. Electron J Stat 2013;7:859–80. https://doi.org/10.1214/13-EJS792.Search in Google Scholar

24. Dempster, AP, Laird, NM, Rubin, DB. Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 1977;39:1–38 https://doi.org/10.1111/j.2517-6161.1977.tb01600.x.Search in Google Scholar

25. Robertson, T, Wright, FT, Dykstra, R. Order restricted statistical inference. Chichester, New York, Brisbane, Toronto, Singapore: John Wiley & Sons; 1988.Search in Google Scholar

26. Best, MJ, Chakravarti, N. Active set algorithms for isotonic regression; a unifying framework. Math Program 1990;47:425–39. https://doi.org/10.1007/BF01580873.Search in Google Scholar

27. Patrick, M, Kurt, H, Jan, dL. Isotonic optimization in R: pool-adjacent-violators algorithm (PAVA) and active set methods. J Stat Software 2009;32:1–24 https://doi.org/10.18637/JSS.V032.I05.Search in Google Scholar

28. Wu, CJ. On the convergence properties of the EM algorithm. Ann Stat 1983:95–103. https://doi.org/10.1214/aos/1176346060.Search in Google Scholar

29. Tan, M, Tian, G-L, Ng, KW. Bayesian missing data problems: EM, data augmentation and non-iterative computation. London and Boca Raton, Florida: Chapman and Hall/CRC; 2009.10.1201/9781420077506Search in Google Scholar

30. Munoz, A. Nonparametric estimation from censored bivariate observations. Technical Report, Department of Statistics, Stanford University; 1980.Search in Google Scholar

31. Campbell, G. Nonparametric bivariate estimation with randomly censored data. Biometrica 1981;68:417–22. https://doi.org/10.1093/biomet/68.2.417.Search in Google Scholar

32. Hanley, JA, Parnes, MN. Nonparametric estimation of a multivariate distribution in the presence of censoring. Biometrics 1983;39:129–39 https://doi.org/10.2307/2530813.Search in Google Scholar

33. Chen, J, Zhang, D, Davidian, M. A Monte Carlo EM algorithm for generalized linear mixed models with flexible random effects distribution. Biometrics 2002;3:347–60. https://doi.org/10.1093/biostatistics/3.3.347.Search in Google Scholar PubMed

34. Bordes, L, Chauveau, D, Vandekerknove, P. A stochastic EM algorithm for a semiparametric mixture model. Comput Stat Data Anal 2007;51:5429–43 https://doi.org/10.1016/j.csda.2006.08.015.Search in Google Scholar

35. Balan, TA, Putter, H. frailtyEM: an R package for estimating semiparametric shared frailty models; 2017.Search in Google Scholar

36. Huang, J, Wellner, JA. Interval censored survival data: a review of recent progress. In: Lin, D, Fleming, T, editors. Proceedings of the first Seattle symposium in biostatistics: survival analysis. New York: Springer-Verlag; 1997: 123–69 pp.10.1007/978-1-4684-6316-3_8Search in Google Scholar

37. Murphy, SA, van der Vaart, AW, Wellner, JA. Current status regression. Math Methods Stat 1999;8:407–25.Search in Google Scholar

38. Groeneboom, P, Hendrickx, K. Current status linear regression. Manuscript; 2018.10.1214/17-AOS1589Search in Google Scholar

39. van der Vaart, A. Semiparametric statistics. In: Cachan, JMM, Groningen, FT, Paris, BT, editors. Lecture notes in mathematics. Springer; 2002.Search in Google Scholar

40. van der Vaart, A, Wellner, J. Weak convergence and empirical processes. Springer; 1996.10.1007/978-1-4757-2545-2Search in Google Scholar

41. Diao, G, Yuan, A. A class of semiparametric cure models with current status data. Lifetime Data Anal 2018;28:26–51. https://doi.org/10.1007/s10985-018-9420-0.Search in Google Scholar PubMed PubMed Central

Supplementary material

The online version of this article offers supplementary material (https://doi.org/10.1515/ijb-2018-0100).

Received: 2018-09-30

Accepted: 2020-09-14

Published Online: 2020-10-08

Targeted design for adaptive clinical trials via semiparametric model

Abstract

1 Introduction

2 The method

2.1 The model and parameter estimation

Remark 1:

2.2 The multi-stage adaptive targeted trial

2.2.1 Test subgroup effect

2.2.2 Test treatment difference

3 Asymptotic results

Theorem 1:

Theorem 2:

Theorem 3:

4 Application to real data

5 Simulation study

6 Concluding remarks

Proof of the proposition

Proof of Theorem 1

Lemma 1

Proof of Theorem 2

Proof of Theorem 3

References

Supplementary material

Journal and Issue

Articles in the same Issue