1 First design problem

1.1 The problem

This problem was posed by Valerii Fedorov at the workshop on Design and Analysis of Experiments in Healthcare held at the Isaac Newton Institute for Mathematical Sciences at Cambridge, UK in July 2015. The context is basket trials, where several different drugs are tested on several different diseases in a single protocol which involves many medical centres: see Derhaschung et al. (2016) and Woodcock and LaVange (2017). The combinatorial properties listed below have been proposed by Fedorov and Leonov (2019) as potentially giving optimal designs, which may give a benchmark for designs which are achievable in practice.

A trial is being designed to compare several drugs for their effects on several different types of cancer. In order to keep the protocol simple for each medical centre involved, it is proposed to limit each medical centre to only a few of the cancer types and only a few of the drugs. For each cancer type at that medical centre, each patient will be allocated to one of the drugs at that medical centre, the aim being that the numbers of such patients on each drug are nearly equal.

Let \(v_1\) be the number of cancer types, \(v_2\) the number of drugs, and b the number of medical centres. The properties listed below are desirable. The first two are to keep the protocol simple. Fedorov and Leonov (2019) propose several statistical models for the response of each patient. The simplest is additive in the effects of medical centre, cancer type and drug. It is not known a priori how many suitable patients will enrol at each medical centre. If there are the same number at each medical centre then conditions (c)–(e) give a design that is optimal in the sense of minimizing the variances of the estimators of parameters of interest: see Sect. 1.4.

  1. (a)

    all medical centres involve the same number, say \(k_1\), of cancer types, where \(k_1<v_1\);

  2. (b)

    all medical centres use the same number, say \(k_2\), of drugs, where \(k_2<v_2\);

  3. (c)

    each pair of distinct cancer types are involved together at the same non-zero number, say \(\lambda _{11}\), of medical centres;

  4. (d)

    each pair of distinct drugs are used together at the same non-zero number, say \(\lambda _{22}\), of medical centres;

  5. (e)

    each drug is used on each type of cancer at the same number, say \(\lambda _{12}\), of medical centres.

The inequalities in conditions (a) and (b) force the medical centres to be incomplete both for cancer types and for drugs. Insisting that the parameters in conditions (c) and (d) are non-zero is necessary to prevent the confounding of either cancer types or drugs with medical centres.

For brevity, from now on the medical centres will be referred to as blocks. Figure 1 shows such a design for six cancer types and five drugs using 10 blocks; it has \(k_1=3\) and \(k_2=2\).

Fig. 1
figure 1

Design for 6 cancer types and 5 drugs, using 10 blocks; each block has 3 cancer types and 2 drugs

Conditions (a) and (c) specify that the design for cancer types is a balanced incomplete-block design, also known as a 2-design, or, more specifically, a 2-\((v_1,k_1,\lambda _{11})\) design. Likewise, conditions (b) and (d) specify that the design for drugs is a 2-design. We call these the C-design and the D-design respectively.

We shall call a design satisfying conditions (a)–(e) a 2-part 2-design or 2-part balanced incomplete-block design. These are not the same as the bipartite designs defined by Hoffman and Liatti (1995).

1.2 Previous work

In Sect. 2 we concentrate on designs with only two different factors (cancer types and drugs), before generalizing to three or more factors in Sect. 3. This is partly to help the reader to become familiar with the ideas, and partly because this case seems likely to be of practical importance in the clinical context described.

The more general case has already been considered by Sitter (1993), Mukerjee (1998) and Hedayat et al. (1999, Sect. 10.8). Because conditions (a)–(d) specify balanced incomplete-block designs and condition (e) is reminiscent of the definition of orthogonal multi-array given by Brickell (1984), Sitter (1993) called these designs balanced orthogonal multi-arrays. Brickell’s original definition was essentially a generalization of orthogonal arrays of strength two and minimal size, so it included the conditions that b is a square and \(\lambda _{12}=1\). Sitter (1993) acknowledged that he was removing those conditions.

However, the original definition of orthogonal multi-array continues to be in use in many areas. They give an alternative definition of semi-Latin squares: see Bailey (1992) and Soicher (1999, 2013). Dually, they are used in factorial designs: see Bailey (2011). Phillips and Wallis (1996) used them in the study of tournaments. They are used in cryptography: see, for example, Anthony et al. (1990) and Martin et al. (1992). Recently, Li et al. (2015) have generalized them to strength t, so that b is a t-th power of an integer. This generalization seems to be within the spirit of the original definition, whereas Sitter’s does not.

Thus we think that “2-part 2-design” (or, more generally, a multi-part 2-design) is a more suitable name.

Sitter (1993) also allowed the block size within each factor to vary. Mukerjee (1998) called the balanced orthogonal multi-arrays proper when this is not allowed. He also restricted attention to the case where \(k_i< v_i\), unlike Sitter (1993). Both allowed \(\lambda _{ii}\) to be zero, which permits confounding: in Table 1 of Mukerjee (1998) one factor has its levels confounded with blocks.

Mukerjee (1998) gave two general constructions for designs of this type. We shall comment on the relationship of these to our constructions at the relevant places.

1.3 Representing the designs

How should we represent a design of this type? Each block has all combinations of \(k_1\) cancer types with \(k_2\) drugs, so a full display would show \(bk_1k_2\) items. For example, in the design in Fig. 1, Block 1 contains the ordered pairs

$$\begin{aligned} (\text{ C1 }, \text{ D1 }), \quad (\text{ C1 }, \text{ D5 }), \quad (\text{ C2 }, \text{ D1 }), \quad (\text{ C2 }, \text{ D5 }), \quad (\text{ C3 }, \text{ D1 }), \quad (\text{ C3 }, \text{ D5 }). \end{aligned}$$

It might be clearer to show these in rectangular form:

$$\begin{aligned} \begin{array}{cc} (\text{ C1 }, \text{ D1 }) &{} (\text{ C1 }, \text{ D5 }) \\ (\text{ C2 }, \text{ D1 }) &{} (\text{ C2 }, \text{ D5 })\\ (\text{ C3 }, \text{ D1 }) &{} (\text{ C3 }, \text{ D5 }) \end{array} \end{aligned}$$

The people running the clinical trial need this full representation.

A dual way to represent the design is to use a \(v_1 \times v_2\) rectangle with \(\lambda _{12}\) entries per cell. Equation (3) below shows that this contains the same number of items as the full representation. The rows are labelled by the cancer types, and the columns by the drugs. The name of each block is shown in each cell (ij) for which the combination of cancer type i and drug j occurs in that block. Figure 2 shows the design in Fig. 1 in this format. This dual representation does not extend easily to the generalization of the problem in Sects. 34.

Fig. 2
figure 2

Dual representation of the design in Fig. 1: the rows and columns of the rectangle are labelled by cancer types and drugs respectively, and each entry in each cell is the name of a block

Fig. 3
figure 3

Concise representation of the design in Fig. 1

The most concise way to represent the design is simply to list, for each block, the cancer types and drugs allocated to it. This list has \(b(k_1+k_2)\) items. This representation was used by Sitter (1993) and Mukerjee (1998). Figure 3 gives the concise representation of the design in Fig. 1.

We shall use the concise representation for the remainder of this paper. However, it can be misinterpreted when removed from the practical context. For example, the reader might think that Block 1 in Fig. 3 contains five treatments, those in the union of the sets \(\{\text{ C1 },\text{ C2 },\text{ C3 }\}\) and \(\{\text{ D1 },\text{ D5 }\}\), rather than the six treatment combinations in the cartesian product of these sets. This misinterpretation gives a block design for \(v_1+v_2\) treatments in b blocks of size \(k_1+k_2\), which we call the zipped form of the original design.

Figure 1 avoids this problem, but at the cost of repeating the information about the drugs in each block. This format contains \(bk_1k_2\) items, as many as the full representation, but it seems easier to read.

In the literature about block designs, the incidence matrix has (ij)-entry equal to the number of times that treatment i occurs in block j: see, for example, John and Williams (1995); Caliński and Kageyama (2000); Bailey and Cameron (2009). Let \(N_1\) be the \(v_1 \times b\) incidence matrix of cancer types in blocks in the zipped form of the design. The (ij)-entry is 1 if cancer type i occurs in block j; otherwise, it is 0. Let \(N_2\) be the analogous \(v_2 \times b\) incidence matrix for drugs in blocks. Then the incidence matrices for the full design are \(k_2N_1\) and \(k_1N_2\) respectively, not allowing for the unknown number of times that each combination will eventually be used in any block.

1.4 Comparison with other designs

At first sight, the design in Fig. 1 appears to be a block design for two treatment factors C and D. However, there are important differences between this and previous designs. In our application, the medical centre represented by Block 1 will accept into the trial only patients with cancer types 1, 2 or 3. It has no control in advance over how many such patients will present themselves. For each of these three cancer types, it will randomize approximately equal numbers of patients to drugs 1 and 5. In the original proposal, the listed drugs include placebo. In a later variants, placebo is not listed, and patients should be randomized approximately equally to drugs 1 and 5 and placebo, or approximately one quarter each to placebo, drug 1, drug 5 and their combination.

Sitter (1993) introduced his designs for use in sampling. In designed experiments, Mukerjee (1998) envisaged a completely different sort of application from the one we describe here. In that, each block represents a single observational unit. For each factor, subsets of the levels are applied, rather than single levels. For example, a group of \(k_1\) people might be needed, all playing similar roles, or a hybrid variety of wheat might be bred from \(k_2\) pure lines. See also Bailey (1992). In this context, it is not problematic to have \(\lambda _{ii}=0\) (so that \(k_i=1\)) for either \(i=1\) or \(i=2\).

In classical factorial designs with blocks of size k, from Yates (1933), Fisher (1935, 1942) and Bose (1947) onwards, combinations of factor levels do not occur more than once in any block: thus \(k_1=k_2=k\). Moreover, the subsets of combinations allocated to blocks are chosen depending on various assumptions about main effects and interactions. For example, if \(v_1=v_2=3\) and there are six blocks of three plots each then the design in Fig. 4 permits estimation of both main effects with full efficiency and all interaction contrasts with efficiency factor 1 / 2.

Fig. 4
figure 4

Usual representation of a classical factorial design for two 3-level treatment factors in 6 blocks of size 3

Fig. 5
figure 5

Dual representation of the factorial design in Fig. 4

The dual form of this design is shown in Fig. 5. The positions of the block names show clearly how the block design was constructed from a pair of mutually orthogonal Latin squares. However, unlike in Fig. 2, no block name occurs more than once in any row or column. Consequently, the occurrences of each block name do not have the rectangular layout that they do in Fig. 2.

Later in the twentieth century there was much literature on incomplete-block designs for two non-interacting treatment factors with each treatment combination occurring once, so that \(v_1v_2=bk\) and \(k_1=k_2=k\), where k is the block size. For example, Preece (1966b) gave the design in Fig. 6. The dual form is in Fig. 7.

Fig. 6
figure 6

Block design for two non-interacting sets of treatments, with \(v_1=6\), \(v_2=5\), \(b=10\) and \(k=3\)

Fig. 7
figure 7

Dual representation of the factorial design in Fig. 6

Many authors required the block design for each treatment factor separately to be balanced. This is the analogue of conditions (a)–(d) when \(k_1=k_2\). From Agrawal (1966) and Preece (1966a) onwards, another condition was often imposed, eventually called adjusted orthogonality by Eccleston and Russell (1977): the product \({\tilde{N}}_1{\tilde{N}}_2^\top \) should have all its entries equal, where \({\tilde{N}}_1\) and \({\tilde{N}}_2\) are the \(v_1\times b\) and \(v_2\times b\) incidence matrices for the first and second treatment factors, respectively, in blocks. Although this is a consequence of condition (e), it is not equivalent to it. The duals of designs satisfying these conditions were called triple arrays by McSorley et al. (2005). The design in Fig. 7 is a triple array.

The statistical relevance of adjusted orthogonality is discussed in Bailey (2017, Sects. 7–8). If all medical centres recruit the same number of patients then, under condition (b) and the first version of the proposal, condition (d) gives a design which is optimal for the estimation of drug effects in the model which excludes the effects of cancer types. These estimates are obtained by adjusting for block effects. Adjusted orthogonality implies that, under the additive model for all three effects, once the responses have been adjusted for block effects then drugs are orthogonal to cancer types and so no further adjustment is needed. Hence conditions (d) and (e) give a design optimal for the estimation of drug effects under condition (b). Likewise, conditions (c) and (e) give a design optimal for the estimation of cancer effects under condtion (a).

In spite of the similar conditions that they satisfy, triple arrays are not special cases of 2-part 2-designs, nor vice versa. In a triple array, no block name occurs more than once in any row or column. In the dual form of a 2-part 2-design, any block name that occurs in a given row must occur \(k_2\) times in that row. A consequence of the “non-zero” part of condition (d) is that \(k_2>1\).

Apart from the designs given by Preece et al. (2005), infinite families of triple arrays have proved frustratingly hard to find: see Bailey (2017, Sect. 13). By contrast, in Sects. 2 and 4 of this paper we give many simple constructions of 2-part 2-designs and their generalizations.

1.5 Conditions on parameters

An ordinary block design is said to be \(\alpha \)-resolved if its set of blocks can be partitioned into classes in such a way that each treatment occurs \(\alpha \) times in each class. This terminology does not extend easily to 2-part 2-designs, because cancer types may occur in different numbers of blocks from drugs. We propose calling a 2-part block design c-partitionable if the set of blocks can be grouped into c classes of b / c blocks each, in such a way that every cancer type occurs the same number of times in each class and every drug occurs the same number of times in each class. It is convenient to extend this terminology to ordinary block designs: such a design with replication r is \(\alpha \)-resolved if and only if it is c-partitionable, where \(\alpha c =r\).

Theorem 1

If there is a 2-part 2-design with the parameters given in conditions (a)–(e), then each cancer type occurs in \(r_1\) blocks and each drug occurs in \(r_2\) blocks, where

$$\begin{aligned} r_1=bk_1/v_1, \qquad r_2=bk_2/v_2. \end{aligned}$$
(1)

Moreover, the following equations are satisfied:

$$\begin{aligned} v_1(v_1-1)\lambda _{11}=bk_1(k_1-1), \qquad v_2(v_2-1)\lambda _{22}=bk_2(k_2-1), \end{aligned}$$
(2)

and

$$\begin{aligned} bk_1k_2=v_1v_2\lambda _{12}, \end{aligned}$$
(3)

as well as the inequality

$$\begin{aligned} b\ge v_1+v_2-1. \end{aligned}$$
(4)

If the design is c-partitionable then

$$\begin{aligned} b \ge v_1 + v_2 +c -2. \end{aligned}$$
(5)

Proof

The first two statements are the usual conditions for the 2-designs on cancer types and drugs respectively, while Eq. (3) equates two different ways of counting the number of choices of a cancer type, a drug, and a block containing both.

For inequality (5), let \(N=(N_1^\top ,N_2^\top ,N_0^\top )^\top \), where \(N_1\) and \(N_2\) are the incidence matrices defined in Sect. 1.3 and \(N_0^\top \) is the \(b\times c\) incidence matrix of blocks in classes. Then

$$\begin{aligned} NN^\top = \left[ \begin{array}{c@{\qquad }c@{\qquad }c} (r_1-\lambda _{11})I+\lambda _{11}J &{} \lambda _{12}J &{} (r_1/c)J\\ \lambda _{12}J &{} (r_2-\lambda _{22})I+\lambda _{22}J &{} (r_2/c)J\\ (r_1/c)J &{} (r_2/c)J &{} (b/c)I \end{array} \right] , \end{aligned}$$

where I and J are identity and all-1 matrices of the appropriate sizes.

We claim that \(NN^\top \) has rank \(v_1+v_2+c-2\), from which inequality (5) follows. First, let \(\mathbf {w}_1\), \(\mathbf {w}_2\) and \(\mathbf {w}_3\) be column vectors of lengths \(v_1\), \(v_2\), c respectively whose entries sum to 0. Then

$$\begin{aligned} NN^\top \left( \begin{array}{c} \mathbf {w}_1\\ \mathbf {w}_2\\ \mathbf {w}_3\end{array} \right) = \left( \begin{array}{c} (r_1-\lambda _{11})\mathbf {w}_1\\ (r_2-\lambda _{22})\mathbf {w}_2\\ (b/c)\mathbf {w}_3 \end{array} \right) . \end{aligned}$$
(6)

Because the blocks are incomplete, \(\lambda _{11}< r_1\) and \(\lambda _{22} <r_2\), and so the restriction of this matrix to the space of such vectors, which has dimension \(v_1+v_2+c-3\), is invertible. The orthogonal complement of this space consists of all vectors of the form \((x\mathbf {j}_1^\top ,y\mathbf {j}_2^\top ,z\mathbf {j}_3^\top )^\top \), where \(\mathbf {j}_1\), \(\mathbf {j}_2\) and \(\mathbf {j}_3\) are all-1 vectors of lengths \(v_1\), \(v_2\) and c respectively. The action of \(NN^\top \) on this space is obtained by replacing the block matrices by their row sums: using the results in (1)–(3), this simplifies to

$$\begin{aligned} \left[ \begin{array}{ccc} r_1k_1 &{} r_1k_2 &{} r_1\\ r_2k_1 &{} r_2k_2 &{} r_2\\ (b/c)k_1 &{} (b/c)k_2 &{} (b/c) \end{array} \right] , \end{aligned}$$

which has rank 1. So the claim (5) is proved.

The first part of the theorem shows that every 2-part 2-design is 1-partitionable. Thus inequality (4) is a special case of inequality (5). \(\square \)

Remark 1

Mukerjee (1998) remarked on the integrality conditions (1)–(3) without stating them explicitly, and proved inequality (4).

Remark 2

Inequality (4) can be regarded as a generalization of both Fisher’s and Bose’s inequalities: see Cameron and van Lint (1991, Chap. 1) and Bailey (2008, Chap. 11). For Fisher’s inequality, take the C-design to be any 2-design with \(v=v_1\), and take a single drug which occurs in all blocks; we have \(\lambda _{12}=r_1\) and \(\lambda _{22}=0\): although our conditions that \(\lambda _{22}>0\) and \(k_2<v_2\) fail for the D-design, the proof still works, because the only vector \(\mathbf {w}_2\) in Eq. (6) is the zero vector: thus the proof gives \(b\ge v+1-1\). For Bose’s inequality, take the C-design to be any resolvable 2-design with \(v=v_1\) and replication \(r=r_1\), and the drugs to be labelled by the resolution classes of the design, with a drug in every block in the corresponding resolution class, so that \(v_2=r\). We have \(\lambda _{12}=1\) and \(\lambda _{22}=0\). Again part of condition (d) fails, but the proof works, giving \(b\ge v+r-1\). Inequality (5) seems to be the true analogue of Bose’s inequality for 2-part 2-designs.

Remark 3

Although neither triple arrays nor 2-part 2-designs are special cases of the other, they both satisfy inequality (4). Proofs for triple arrays are in Bagchi (1998), Bailey (2017) and McSorley et al. (2005), and the proof that we have given here also works for triple arrays.

2 Constructions of 2-part 2-designs

In this section, we give several constructions. In order to identify when two different constructions give designs which are essentially the same, we say that two 2-part 2-designs are isomorphic to each other if one can be obtained from the other by relabelling some of blocks, cancer types and drugs. Weak isomorphism generalizes this by also allowing the roles of cancer types and drugs to be interchanged.

Given two or more non-isomorphic designs for the same parameters, there may be practical reasons for preferring one over the rest.

Since interchanging roles does not affect conditions (a)–(e), from now on we usually adopt the convention that

$$\begin{aligned} v_1\ge v_2. \end{aligned}$$
(7)

Given a 2-part 2-design, the procedure of C-swap creates a new 2-part 2-design. This simply involves replacing the set of cancer types in each block with the complementary set. This changes the parameters \(k_1\), \(\lambda _{11}\) and \(\lambda _{12}\) to \(v_1-k_1\), \(b-2r_1+\lambda _{11}\) and \(r_2-\lambda _{12}\), leaving b, \(v_1\), \(v_2\), \(k_2\) and \(\lambda _{22}\) unchanged. The new design fulfills all the conditions so long as \(v_1-k_1\ge 2\). The combination of a C-swap and the analogous D-swap has the effect of replacing each block by its complement (in the zipped form).

Thus, in our search for design constructions, we may assume that

$$\begin{aligned} \text{ for } i=1\hbox { and }i=2, \hbox {either }k_i\le v_i/2\hbox { or }k_i=v_i-1. \end{aligned}$$
(8)

All of our tables are limited to parameter sets which satisfy conditions (7) and (8).

Construction 1

(Cartesian products) One obvious method of construction is the cartesian product. This starts with two balanced incomplete-block designs, one for \(v_1\) treatments in \(b_1\) blocks of size \(k_1\), the other for \(v_2\) treatments in \(b_2\) blocks of size \(k_2\). Form all \(b_1b_2\) combinations of a block of each sort. For each combination, form the cartesian product of their subsets of treatments.

This will usually result in rather large values of b. For example, when \(v_1=6\), \(k_1=3\), \(v_2=5\) and \(k_2=2\) then the smallest possible values of \(b_1\) and \(b_2\) are both 10, so this construction gives a design with 100 blocks, unlike the design with 10 blocks in Fig. 1.

Table 1 shows the parameters of the designs with the least number of blocks which can be constructed by this method with \(v_1\ge v_2\), using the table of 2-designs in Appendix I of Hall (1986); note that design 13 in that table should have \(k=4\).

Table 1 Parameter sets for the designs with the least number of blocks that can be made by Construction 1: \(v_1\) is the number of cancer types, \(v_2\) is the number of drugs, and b is the number of blocks, each of which has \(k_1\) cancer types and \(k_2\) drugs

Construction 2

(Subcartesian products) If \(k_2\) divides \(v_2\) then there may exist a resolved 2-design \(\varDelta _2\) for \(v_2\) drugs in \(b_2\) blocks of size \(k_2\) with r resolution classes. Suppose that \(\varDelta _1\) is a 2-design for \(v_1\) cancer types in \(b_1\) blocks of size \(k_1\), where \(b_1\) is a multiple of r. Now we can achieve a 2-part 2-design without taking the full product. Partition the blocks of \(\varDelta _1\) into r classes of size \(b_1/r\) in any way at all, and match these classes to the resolution classes of \(\varDelta _2\) in any way. For each matched pair, construct the cartesian product design. Putting these products together gives a design of the required type with \(b_1b_2/r\) blocks, considerably fewer than the \(b_1b_2\) blocks in the entire product of \(\varDelta _1\) and \(\varDelta _2\).

More generally, if the design \(\varDelta _2\) is c-partitionable and c divides \(b_1\) then replace the resolution classes in this construction by the c classes of blocks. This gives a 2-part 2-design with \(b_1b_2/c\) blocks. Putting \(c=1\) gives Construction 1 as a special case of this.

Figures 8 and 9 show two possibilities when \(v_1=v_2=4\), \(k_1=k_2=2\) and \(r=3\).

Fig. 8
figure 8

Design for 4 cancer types and 4 drugs, using 12 blocks, each with 2 cancer types and 2 drugs. This can be made by Construction 2 and by Construction 3

Fig. 9
figure 9

Design for 4 cancer types and 4 drugs, using 12 blocks, each with 2 cancer types and 2 drugs. This can be made by Construction 2 but not by Construction 3

Table 2 shows some parameter sets for designs that can be made by Construction 2, possibly after an interchange or a swap, with \(k_i\le 10\) for \(i=1\) and \(i=2\). See the database in DesignTheory.org (2012) for the resolved designs used.

There are two special cases. When \(b_1=r\) then we simply match the blocks of \(\varDelta _1\) to the resolution classes of \(\varDelta _2\). When \(v_1=3\), \(k_1=2\), \(v_2=4\), \(k_2=2\) and \(r=3\), this gives the design in Fig. 10. When \(v_1=v_2=6\), \(k_1=k_2=3\) and \(r=10\), this gives the design in Fig. 11. When \(v_1=7\), \(k_1=3\), \(v_2=15\), \(k_2=3\) and \(r=7\), this gives a 2-part 2-design with \(b=35\), \(r_1=15\), \(r_2=7\), \(\lambda _{11}=5\), \(\lambda _{22}=1\) and \(\lambda _{12}=3\).

On the other hand, if \(\varDelta _1\) is also resolved with replication r then we may match the resolution classes of the two designs. For example, when \(v_1=v_2=4\) and \(k_1=k_2=2\) then we may take \(r=3\) and \(b_1=b_2=6\) to get the design in Fig. 8. This is not even weakly isomorphic to the design in Fig. 9, where the pairs of blocks from \(\varDelta _1\) do not form resolution classes. When \(v_1/k_1 = v_2/k_2=2\) and \(b_1=b_2\), Construction 3 also gives designs with these parameters.

Table 2 Parameter sets for the designs with the least number of blocks with \(k_1\le 10\) and \(k_2\le 10\) that can be made by Constructions 2 or 3 but not 1: \(v_1\) is the number of cancer types, \(v_2\) is the number of drugs, and b is the number of blocks, each of which has \(k_1\) cancer types and \(k_2\) drugs; r is a number used in Construction 2

At first sight, the two general constructions given by Mukerjee (1998) are special cases of this. His first construction needs both \(\varDelta _1\) and \(\varDelta _2\) to be c-partitionable, and matches the classes. This includes the cartesian product when \(c=1\), and when \(c=3\) it gives the design in Fig. 8 but not the one in Fig. 9. His second construction uses a c-partitionable design \(\varDelta _2\) only when \(b_1=c\). However, if c divides \(b_1\) then we may replace \(\varDelta _2\) by \(b_1/c\) copies of it, giving a \(b_1\)-partionable design whose classes can be matched to the blocks of \(\varDelta _1\).

Thus Construction 2 is precisely equivalent to the combination of the two in Mukerjee (1998).

Some 2-part 2-designs in which \(v_1=v_2\) and \(k_1=k_2 =v_1/2\) arise from Construction 2. Put \(n=k_1\). Suppose that \(\varDelta _0\) is a 2-design for 2n treatments in \(2r_0\) blocks of size n. If \(\varDelta _0\) is resolvable then we may put \(\varDelta _1=\varDelta _2=\varDelta _0\) in Construction 2, and match the resolution classes of \(\varDelta _1\) and \(\varDelta _2\) to obtain an \(r_0\)-partitionable 2-part 2-design in \(4r_0\) blocks. Figure 8 gives an example with \(n=2\). If \(\varDelta _0\) is not resolvable, then let \(\varDelta _2\) be the design with \(4r_0\) blocks consisting of \(\varDelta _0\) and its complement. This is resolvable, with \(r=2r_0\). Put \(\varDelta _1=\varDelta _0\) and apply Construction 2, matching the blocks of \(\varDelta _1\) to the replicates of \(\varDelta _2\). Again, this gives a 2-part 2-design in \(4r_0\) blocks. However, this design is not \(r_0\)-partitionable, because its C-design is not resolvable. Figure 11 shows an example with \(n=3\).

There are sometimes be operational reasons for preferring resolvable designs. Moreover, they can be used as ingredients in Construction 9 in Sect. 4 to give designs without too many blocks. The next construction always give resolvable designs for such parameters.

Construction 3

(Hadamard matrices) Start with a Hadamard matrix H of order 4n in which the elements in the first row are all \(+1\). Identify the 2n cancer types with the columns in which the second row has entry \(+1\), and identify the 2n drugs with the columns in which the second row has entry \(-1\). Each of the remaining rows gives two blocks, one containing all the objects whose columns have entries \(+1\), and one containing all the objects whose columns have entries \(-1\). Thus \(b=8n-4\). Moreover, each pair of blocks contains each cancer type and each drug just once, in the concise representation, so the 2-part 2-design is \((4n-2)\)-partitionable and the lower bound in inequality (5) is achieved.

Fig. 10
figure 10

Design for 3 cancer types and 4 drugs, using 6 blocks, each with 2 cancer types and 2 drugs

Fig. 11
figure 11

A design for 6 cancer types and 6 drugs, using 20 blocks, made from Construction 2

Fig. 12
figure 12

A design for 6 cancer types, 6 drugs and 5 biomarkers, using 20 blocks, made by Construction 3 followed by Construction 9

For example, when \(n=3\) we can take

$$\begin{aligned} H = \left[ \begin{array}{cccccccccccc} +1 &{} +1 &{}+1 &{}+1 &{}+1 &{}+1 &{}+1 &{}+1 &{}+1 &{}+1 &{}+1 &{}+1\\ +1 &{}+1 &{}+1 &{}+1 &{}+1 &{}+1 &{}-1 &{} -1 &{} -1 &{} -1 &{} -1 &{} -1\\ +1 &{} -1 &{} +1 &{} -1 &{} +1 &{} -1 &{} +1 &{} -1 &{} -1 &{} +1 &{} +1 &{} -1\\ +1 &{} -1 &{} -1 &{} -1 &{} +1 &{} +1 &{} -1 &{} -1 &{} +1 &{} -1 &{} +1 &{} +1\\ +1 &{} +1 &{} +1 &{} -1 &{} -1 &{} -1 &{} -1 &{} +1 &{} +1 &{} -1 &{} +1 &{} -1\\ +1 &{} -1 &{} -1 &{} +1 &{} +1 &{} -1 &{} +1 &{} +1 &{} +1 &{} -1 &{} -1 &{} -1\\ +1 &{} -1 &{} -1 &{} +1 &{} -1 &{} +1 &{} -1 &{} +1 &{} -1 &{} +1 &{} +1 &{} -1\\ +1 &{} -1 &{} +1 &{} +1 &{} -1 &{} -1 &{} -1 &{} -1 &{} +1 &{} +1 &{} -1 &{} +1\\ +1 &{} +1 &{} -1 &{} -1 &{} +1 &{} -1 &{} -1 &{} +1 &{} -1 &{} +1 &{} -1 &{} +1\\ +1 &{} +1 &{} -1 &{} +1 &{} -1 &{} -1 &{} +1 &{} -1 &{} -1 &{} -1 &{} +1 &{} +1\\ +1 &{} +1 &{} -1 &{} -1 &{} -1 &{} +1 &{} +1 &{} -1 &{} +1 &{} +1 &{} -1 &{} -1\\ +1 &{} -1 &{} +1 &{} -1 &{} -1 &{} +1 &{} +1 &{} +1 &{} -1 &{} -1 &{} -1 &{} +1 \end{array} \right] . \end{aligned}$$

Labelling the columns as \(C_1\), ..., \(C_6\), \(D_1\), ..., \(D_6\) in order, the construction gives the design in the first three columns of Fig. 12, ignoring the biomarkers. It is not weakly isomorphic to the design in Fig. 11, because all triples of cancer types and all triples of drugs occur.

The asterisked entries in Table 2 show the parameters of the smallest designs that can be constructed by this method.

When \(n=4\) this construction gives the design in Fig. 8. For some values of n, different choices of Hadamard matrix, or different designations of which row is second, can give non-isomorphic designs. It may be that there are some values of n for which there exists a Hadamard matrix of order 4n but no 2-\((2n,n,n-1)\) design. If so, Construction 3 gives a design for these parameters but Construction 2 does not. Such a value of n is likely to be too large to affect designs of practical size.

Construction 4

(Symmetric 2-designs) Here is another general method of construction. Consider a symmetric balanced incomplete-block design \(\varDelta \) for v treatments in v blocks of size k. Every pair of distinct treatments concur in \(\lambda \) blocks, where \(\lambda = k(k-1)/(v-1)\), and every pair of distinct blocks have \(\lambda \) treatments in common. Let \(\Gamma \) be one block of \(\varDelta \). Identify the treatments in \(\Gamma \) with k drugs \(D_1\), ..., \(D_k\) and the remaining treatments with \(v-k\) cancer types \(C_1\), ..., \(C_{v-k}\). Now consider the design \(\varDelta '\) consisting of all blocks of \(\varDelta \) except \(\Gamma \). Each of these blocks contains \(\lambda \) drugs and \(k-\lambda \) cancer types. In \(\varDelta '\), each pair of drugs concur in \(\lambda -1\) blocks; each pair of cancer types concur in \(\lambda \) blocks; and each drug occurs with each cancer type in \(\lambda \) blocks. Thus \(b = v-1\), \(v_1=v-k\), \(v_2=k\), \(k_1=k-\lambda \), \(k_2=\lambda \), \(\lambda _{11} = \lambda _{12}= \lambda \) and \(\lambda _{22} = \lambda -1\).

We can use Construction 4 whenever there exists a symmetric 2-\((v,k,\lambda )\) design with \(v=v_1+v_2\), \(k=v_2\) and \(\lambda =k_2\), provided that \(k_1+k_2=v_2\). In order to satisfy condition (d), \(\lambda \) must be bigger than one. The lower bound in inequality (4) is always met.

The properties of symmetric 2-designs guarantee that conditions (c) and (d) hold, but they also match up the blocks of the C-design and the D-design, which typically produces fewer blocks than previous construction methods.

The design in Fig. 1 can be obtained by this construction with \(v=11\), \(k=5\) and \(\lambda =2\). Figure 10 gives the design with \(v=7\), \(k=4\) and \(\lambda =2\).

Table 3 lists parameter sets for small designs that can be constructed by this method, with an interchange and swaps where necessary: again using Table I.1 in Hall (1986). After allowing for possible interchanges and swaps, this table represents 38 designs.

Table 3 Parameter sets for which small designs can be made by Construction 4: \(v_1\) is the number of cancer types, \(v_2\) is the number of drugs, and b is the number of blocks, each of which has \(k_1\) cancer types and \(k_2\) drugs; v, k and \(\lambda \) are parameters of the symmetric 2-design used in the construction

Construction 5

(Augmentation) Given a 2-part 2-design \(\varDelta \) in which \(v_2=2k_2+1\), we may augment it to one for one more drug by increasing \(v_2\) to \(v_2+1\) and \(k_2\) to \(k_2+1\) while merely doubling the number of blocks. Replace each block of \(\varDelta \) by two blocks, both with the same set of cancer types as before. One of these blocks has the previous set of drugs and the extra drug, while the other has all the remaining drugs.

For example, augmenting the design in Fig. 1 gives the design in Fig. 11.

Applying the augmentation just to the D-design gives a resolvable 3-design, as shown in the Extension Theorem of Alltop (1972). This can be used directly in Construction 2. However, augmentation is such a straightforward way of obtaining one 2-part 2-design from another that we think it is worth identifying.

Construction 6

(Group divisible designs) If \(v_1=v_2=v\) and \(k_1=k_2=k\) then the zipped form of a 2-part 2-design is a semi-regular group-divisible incomplete-block design for two so-called groups of v treatments in blocks of size 2k with \(k>1\): see Bose and Connor (1952). Unzipping any one of these gives a 2-part 2-design.

Table VII of Clatworthy (1973) gives three such designs. Unzipping them gives the product design for the first parameter set in Table 1, the design in Fig. 8 and the design in the first three columns of Fig. 12.

Construction 7

(Group actions) Here is a construction based on group actions. Suppose that the group G acts 2-transitively on two sets C and D of sizes \(v_1\) and \(v_2\) respectively, and that G is also transitive in the induced action on \(C\times D\). Choose a subset of C and a subset of D, each containing at least two points; their union is a block, and the images of this block under G give the remaining blocks. The blocks have to be unzipped to give a 2-part 2-design. This does not give much control over b, except that we know it is a divisor of the order of G. A strategy for finding good designs by this method is to choose a subgroup H of G which acts intransitively on each of C and D, and to use fixed sets of H on C and D in the construction.

The three examples below arise from this construction, but can be more easily be derived from the 3-(22, 6, 1) design \(\varXi \) whose automorphism group is the Mathieu group \(M_{22}\). It has 22 points and 77 blocks of size 6, any two blocks meeting in zero or two points; see Cameron & van Lint (1991, Chaps. 1 and 9). For simplicity, we describe the cancer types as red points and the drugs as green points.

Take a block \(B_0\) of the design \(\varXi \); its points are red, and the remaining 16 points are green. For each of the 60 blocks meeting \(B_0\) in two points, we define a block of our new design containing two red and four green points. Now two red points lie in five blocks, one of which is \(B_0\); so they lie in four more blocks. A red and green point lie in five blocks, each containing two red points. Two green points lie in three blocks meeting \(B_0\). For each point of \(B_0\) lies in a unique such block, and each block contains two points of \(B_0\). So we have an example with \(v_1=6\), \(v_2=16\), \(b=60\), \(k_1=2\), \(k_2=4\), and \((\lambda _{11},\lambda _{12},\lambda _{22})=(4,5,3)\).

The other two examples use the 4-(23, 7, 1) design \(\varTheta \) in which the blocks through the extra point are formed by adjoining that point to the blocks of \(\varXi \): see Cameron and van Lint (1991). The counting arguments that verify their properties are similar to what we have just seen.

For the second design, we take a set A of seven points which form a block of \(\varTheta \) not containing the extra point. These will be red, and the remaining 15 points of \(\varXi \) green. Any block of \(\varXi \) meets A in one or three points; we take the blocks meeting A in three points to be the blocks of the required design. We obtain an example with \(v_1=7\), \(v_2=15\), \(b=35\), \(k_1=k_2=3\), and \((\lambda _{11},\lambda _{12},\lambda _{22})=(5,3,1)\).

This has the same parameters as the fifth design made using Construction 2.

Finally, using the 23-point design \(\varTheta \) but not throwing away the extra point we obtain a design with \(v_1=7\), \(v_2=16\), \(b=140\), \(k_1=3\), \(k_2=4\), and \((\lambda _{11},\lambda _{12},\lambda _{22})=(20,15,7)\). Another design with these parameters is the cartesian product of the projective plane of order 2 and the affine plane of order 4; these designs are not isomorphic.

To build these from the group action construction, the relevant groups are the stabilizers of the sets of six or seven red points in the appropriate Mathieu groups; these are the groups \(2^4:S_6\), \(A_7\), and \(2^4:A_7\) respectively.

3 Generalizing the design problem

3.1 The extended problem

In March 2016 Valerii Fedorov extended the problem as follows. Can we add a third factor, whose levels are biomarkers in this case, subject to the obvious extra conditions? Here we generalize this to an arbitrary number m of factors.

The conditions for an m-part 2-design are as follows. The analogue of conditions (a)–(b) is that, for \(1\le i \le m\), factor i has \(v_i\) levels and each medical centre involves \(k_i\) of them, where \(k_i<v_i\); the analogue of conditions (c)–(d) is that, for \(1\le i\le m\), each pair of levels of factor i are used together at the same non-zero number \(\lambda _{ii}\) of medical centres.

The generalization of condition (e) is less clear. When \(i=3\), a weak generalization is that each biomarker is used on each cancer type at the same number \(\lambda _{13}\) of medical centres and that each biomarker is used with each drug at the same number \(\lambda _{23}\) of medical centres. For now, we use this weak version. Note, however, that this gets us into the territory of factorial design, so we might be confounding all or part of a two-factor interaction with all or part of a main effect. By analogy with orthogonal arrays (Hedayat et al. 1999), we call this weak generalization a 3-part 2-design with strength 2, whereas a 3-part 2-design with strength 3 would have every triple of (cancer type, drug, biomarker) at the same number \(\lambda _{123}\) of medical centres.

Thus the strength-2 generalization of condition (e) is that, for \(1 \le i <j \le m\), each level of factor i occurs with each level of factor j at the same number \(\lambda _{ij}\) of medical centres.

3.2 Conditions on parameters in the extended problem

The definition of c-partitionable extends to m-part 2-designs in the obvious way.

Theorem 2

In an m-part 2-design of strength 2, all of the following are satisfied.

  1. (i)

    The analogues of Eqs. (1) and (2) hold for each factor.

  2. (ii)

    Equation (3) generalizes to \(bk_ik_j = { v}_i{ v}_j\lambda _{ij}\) for \(1\le i<j\le m\).

  3. (iii)

    If the design is c-partitionable then \(b \ge v_1+ \cdots + v_m +c-m\).

  4. (iv)

    In particular, \(b \ge v_1 + \cdots + v_m - m+1\).

Proofs are similar to those in Sect. 1.5. Part (iv) is precisely Theorem 1 of Mukerjee (1998).

4 Constructions of m-part 2-designs of strength at least 2

4.1 Two main constructions

Here we give the main construction of Mukerjee (1998) in the language of this paper.

Construction 8

(Orthogonal arrays) Suppose that there is a positive integer c such that, for \(i=1\), ..., m, \(\varDelta _i\) is a c-partitionable 2-design for \(v_i\) treatments in \(b_i\) blocks of size \(k_i\). Moreover, there is an orthogonal array \(\Gamma \) with m columns, where column i contains \(b_i/c\) symbols for \(1\le i \le m\).

Match the c classes of blocks of \(\varDelta _1\), ..., \(\varDelta _m\). For \(j=1\), ..., c separately, each row \(\rho \) of \(\Gamma \) gives a block of the new design, as follows. For \(i=1\), ..., m, identify the block in class j of \(\varDelta _i\) labelled by the symbol in row \(\rho \) and column i of \(\Gamma \): then form the cartesian product of these m blocks. This gives a c-partitionable m-part 2-design in sc blocks, where s is the number of rows of \(\Gamma \). The strength of this new design is equal to the strength of the orthogonal array \(\Gamma \).

In one extreme case, \(\Gamma \) has all possible different rows, so that \(s=\left( \prod _{i=1}^m b_i\right) /c^m\). If, in addition, \(c=1\), then \(s=\prod _{i=1}^m b_i\) and we obtain the full cartesian product.

The design in Fig. 13 can be made in this way with \(c=1\), using an orthogonal array with three columns, each with three symbols.

An example with \(m=3\) and \(c=3\) is shown in Fig. 14, which is contained in Table 1 of Mukerjee (1998). Here, \(v_i=4\), \(k_i=2\) and \(b_i=6\) for \(i=1\), 2, 3, and each of \(\varDelta _1\), \(\varDelta _2\) and \(\varDelta _3\) can be resolved into three pairs of blocks. For each design, label the replicates 1, 2, 3 in any order. For \(j=1\), 2, 3, combine the j-th replicates from the three designs, not by the full cartesian product, which would give eight blocks, but by using an orthogonal array of strength 2 with four rows and three columns, each with two symbols. This 3-part 2-design has strength 2 but not strength 3.

Fig. 13
figure 13

A 3-part 2-design for 3 cancer types, 3 drugs and 3 biomarkers, made from Construction 8

Fig. 14
figure 14

A 3-part 2-design for 4 cancer types, 4 drugs and 4 biomarkers, made from Construction 8

Table 1 of Sitter (1993) gives a 7-part 2-design made in this way with \(b=24\) and \(v_i=2k_i=4\) for \(i=1\), ..., 7.

Construction 9

(Products of multi-part designs) The ingredients of the previous construction are m individual 2-designs and an orthogonal array, which may be trivial. Instead, we may start with multi-part 2-designs, or an assortment of 2-designs and multi-part 2-designs. The use of orthogonal arrays and/or c-partitioning can be extended to this method too. As in Construction 2, we can allow one of the constituent designs to be not c-partitionable, so long as its number b of blocks is divisible by c.

The full product of an \(m_1\)-part 2-design \(\varTheta _1\) with \(b_1\) blocks and an \(m_2\)-part 2-design \(\varTheta _2\) with \(b_2\) blocks is an \((m_1+m_2)\)-part 2-design with \(b_1b_2\) blocks and strength 2. If \(\varTheta _1\) has strength \(m_1\) and \(m_2=1\) then the full product has strength \(m_1+1\). For example, if \(m=3\), \(v_3=3\) and \(k_3=2\) then the product of the design in Fig. 1 and a 2-design with three blocks of size 2 gives a 3-part 2-design with 30 blocks and strength 3.

As an example of the relaxation of the c-partionable condition, suppose that \(\varTheta \) is a c-partitionable 2-part 2-design for drugs and cancer types and \(\varDelta \) is a 2-design for \(v_3\) biomarkers in c blocks of size \(k_3\). We can simply match the blocks of \(\varDelta \) to the classes of \(\varTheta \) in any way. The 3-part 2-design in Fig. 12 was made like this by starting with a 2-part 2-design made by Construction 3, grouping blocks into ten classes of the form \(\{2i-1,2i\}\), and matching these classes to the ten blocks of a 2-design \(\varDelta \) for five biomarkers. Similarly, if \(v_1=v_2=v_3=6\) and \(k_1=k_2=k_3=3\) we can obtain a 3-part 2-design in 20 blocks by matching the ten blocks of a 2-(6, 3, 2) design to the ten classes in the left-hand side of Fig. 12.

Fig. 15
figure 15

A 4-part 2-design

The special case of the last part of this construction with \(b=c\) is the second general construction given by Mukerjee (1998). As noted in Sect. 2, this specialization does not restrict his designs. However, because we have now given more constructions for the case that \(m=2\), applying the various product constructions to them produces new designs for higher values of m also.

4.2 Other constructions

The augmentation method in Construction 5 easily generalizes to three or more factors. If \(v_i =2k_i +1\) then \(v_i\) and \(k_i\) can be increased by one while the number of blocks is merely doubled.

If \(v_1= \cdots =v_m = v\) and \(k_1= \cdots =k_m=k\) then the zipped form of an m-part 2-design is a semi-regular group-divisible design for mv treatments in blocks of size mk with \(k>1\). Just as in Construction 6, any such design can be unzipped to give a m-part 2-design. There are two such designs with \(m=3\) in Table VII of Clatworthy (1973). Their unzipped forms are the designs in Figs. 13 and 14. The one with \(m=4\) gives the design in Fig. 15, which can also be obtained from a \(9\times 4\) orthogonal array with three symbols in each column.

The group method in Construction 7 also easily extends to three or more factors: simply take a permutation group with more than two 2-transitive actions.