On the nature of saturated 2k- factorial designs for unbiased estimation of non-negligible parameters

https://doi.org/10.1016/j.jspi.2020.05.006Get rights and content

Highlights

  • Let HN be a Hadamard matrix partitioned as HN = DEVC.

 |det(D)| is proportional to |det(C)|.

 D1=1N[DEC1V]T.

  • A saturated D-optimal design for mean, main effects and second order interactions.

Abstract

We contemplate an experimental situation in a 2k-factorial experiment with acute resource crunch so that we need to conduct just a saturated design [SD] - with the understanding that precision of the estimates cannot be estimated from the data. It is known beforehand which effect(s)/interaction(s) are likely to be negligible. We examine the flexibility to the extent that an experimenter can make a choice of an SD in order to retain information on all the remaining [non-negligible] effects/interactions.

Introduction

Two-level factorial designs (TLFD) are widely used in scientific and industrial experimentation for various reasons. Standard textbooks deal with this topic at various lengths — covering concepts such as (i) Unreplicated Full Factorials, (ii) Replicated Full Factorials, (iii) Blocking, (iv) Total, Partial and Balanced/Unbalanced Confounding, (v) Fractional Factorials, etc. Practitioners primarily use TLFD at an early stage of an experimentation to screen potential factors that are involved in the system being investigated. The statistical models underlying TLFD are simple and subject to relatively weak assumptions. Each factor – whether quantitative or qualitative – is assumed to have two levels that are conveniently coded as 1 or 1 in the design matrix which turns out to be a Hadamard matrix. Hadamard matrices have been studied extensively in the literature. As a case in point, see Friedland and Aliabadi (2018) for classical properties of Hadamard matrices. The estimators of the effects/interactions of TLFD are contrasts that are naturally simple to interpret. The effect of a factor is interpreted as a measure of the change in the response variable due to the variation of the factor from low to high — averaged over all other factor levels.

In practice investigators postulate, for one reason or the other, that certain parameters (usually higher order interactions) are unimportant or negligible. When that is the case, it is desirable for them to conduct the experiment with the least number of runs that would ensure unbiased estimation of the important parameters that is, non-negligible main effects and interactions of interest. Regular Fractional Factorial Designs (RFFD) are used in this kind of situation and there is a vast literature available on RFFD. See, for example, Montgomery (1996).

In the framework of a two-level factorial design, one of the drawbacks of RFFD is that the number of runs needed to conduct the experiment is necessarily a multiple of 4. Thus, when the important effects and interactions to be estimated are identified beforehand, using an RFFD may lead to the use of more resources than the bare minimum needed for the estimation of the important effects and interactions. For instance, if the number of factors is k=5 and the only important effects are the main effects plus the mean then using an 2III52 RFFD of resolution III would require 8 runs for the experiment. This would actually estimate the 5 main effects plus the mean but also can provide estimates of two other parameters that are known to be negligible.

A Saturated Design (SD) could be used in case of scarce resources when it is clear to the investigator which parameters are important and non-negligible. However, it turns out that the identification of a SD is a challenging problem. Numerous papers available in the literature discuss how to construct SDs under certain conditions. See Hedayat and Pesotan, 1992, Hedayat and Pesotan, 2007. In addition, various computer algorithms have been developed to search for SDs in the TLFD set-up. Some of these are SPAN, DETMAX. See Hedayat and Zhu (2011). It is worth pointing out that when RFFD are used to estimate a certain vector parameter of interest, the estimator of each effect except the mean is a contrast in terms of the runs and it is clear to practitioners that each estimator measures an interaction or the change in the response variable due to the variation of some factor from low to high. The common practice available in the literature is to choose a SD for which the underlying design matrix is non-singular. The Ordinary Least Squares (OLS) method is then used to obtain the estimator of the vector parameter of interest.

The question we may ask is the following “ is the estimator [BLUE] of each parameter (except the common mean) in a SD model a contrast in terms of the runs?”. Well, if the design matrix is a Hadamard matrix then the answer is trivially ‘yes’ since the SD in that case can be seen as an RFFD. However, when the design matrix is not a Hadamard matrix, the estimator of the vector parameter is given by β1ˆ=(DTD)1DTY=D1Y, where D is the saturated design matrix. In practice, it is desirable for practitioners to have the estimator of each estimable effect as contrast in terms of the runs for the sake of interpretation. It is interesting to verify that it is indeed so even when the design is not based on a Hadamard matrix. This can be seen as follows.

Let D be a non-singular matrix of order n with its first column being the vector 1n ( a vector of length n, where all its entries are 1). Let e1=(1,0,0,,0)T denote the column vector of length n. Then, De1 is a vector of 1’s which means that De1=1n. Hence, whenever D is non-singular, D1De1=D11n which implies that D11n=e1. This is equivalent to the statement that the estimates of all model parameters [except the over-all mean] are linear observational contrasts, irrespective of the nature of elements of the matrix D.

The rest of the paper is organized as follows. In Section 2, we develop a general theory for ‘deletion of exact number of runs’ so as to ensure the estimability of all non-negligible effects/interactions in a saturated 2k-factorial experiment. This is done through identification of the runs to be deleted (admissible runs for deletion), for any given collection of non-negligible effects/interactions. We give some illustrative examples in the case of 23-factorial experiment. It turns out that the admissible runs for deletion may not be unique. Therefore, it is desirable to identify the admissible set for deletion that would ensure the optimality of the design matrix under the D-optimality criterion. Thus, in Section 3, we shed more light on understanding the algorithm developed in Section 2 and explain how it may be used to classify design matrices with respect to the absolute value of their determinants. Then, we take up some illustrative examples in the case of 24-factorial experiments.

Section snippets

General theory for identification of runs for deletion — retaining estimability of non-negligible effects/interactions in a saturated design

We begin by listing several properties of a Hadamard matrix of order N=2k.

Theorem 1

Let HN be a Hadamard matrix of order N that is partitioned into block matrices as HN=DEVC, where D and C are square matrices of order n and d respectively such that nd. Then we have the following results:

  • 1.

    |det(D)|=Nnd2|det(C)|.

  • 2.

    If D is non-singular then C is also non-singular and the inverse of D is given by: D1=1N[DEC1V]T.

Proof

HN=DEVCbeing a Hadamard matrix, HNHNT=HNTHN=NIN. Therefore, we have DTD=VTV+(n+d)In, where N=n+d

A general rule for identifying admissible set

Theorem 1 states that whenever a Hadamard matrix HN is partitioned into block matrices as in Section 2, then |det(D)|=Nnd2|det(C)| and the matrix D is non-singular if and only if the matrix C is non-singular. The take-away message here in terms of a saturated design is that there is a direct relationship between a saturated design matrix D and the matrix C. It means that when the experimenter is deciding which runs to keep in a saturated design he has two equivalent options he can choose from.

Concluding remarks

The construction of saturated designs for two level factorial experiments has gained substantial interest over a long period of time. Numerous papers have been written about the classification of saturated design matrices of fixed order via the spectrum of the determinant function. Thus, the spectra of the determinant function Sn for {1,+1}-matrices of order n are well-known in the literature for orders up to 11. The spectrum of order n=8 was due to Metropolis (1971). For n=9 and n=10, the

Acknowledgments

This work is partially supported by the US National Science Foundation (NSF Grant 1809681).

This research work was carried out while Bikas Sinha was a visitor at UIC, Chicago.

References (9)

There are more references available in the full text version of this article.

Cited by (2)

View full text