On the nature of saturated - factorial designs for unbiased estimation of non-negligible parameters
Introduction
Two-level factorial designs (TLFD) are widely used in scientific and industrial experimentation for various reasons. Standard textbooks deal with this topic at various lengths — covering concepts such as (i) Unreplicated Full Factorials, (ii) Replicated Full Factorials, (iii) Blocking, (iv) Total, Partial and Balanced/Unbalanced Confounding, (v) Fractional Factorials, etc. Practitioners primarily use TLFD at an early stage of an experimentation to screen potential factors that are involved in the system being investigated. The statistical models underlying TLFD are simple and subject to relatively weak assumptions. Each factor – whether quantitative or qualitative – is assumed to have two levels that are conveniently coded as or in the design matrix which turns out to be a Hadamard matrix. Hadamard matrices have been studied extensively in the literature. As a case in point, see Friedland and Aliabadi (2018) for classical properties of Hadamard matrices. The estimators of the effects/interactions of TLFD are contrasts that are naturally simple to interpret. The effect of a factor is interpreted as a measure of the change in the response variable due to the variation of the factor from low to high — averaged over all other factor levels.
In practice investigators postulate, for one reason or the other, that certain parameters (usually higher order interactions) are unimportant or negligible. When that is the case, it is desirable for them to conduct the experiment with the least number of runs that would ensure unbiased estimation of the important parameters that is, non-negligible main effects and interactions of interest. Regular Fractional Factorial Designs (RFFD) are used in this kind of situation and there is a vast literature available on RFFD. See, for example, Montgomery (1996).
In the framework of a two-level factorial design, one of the drawbacks of RFFD is that the number of runs needed to conduct the experiment is necessarily a multiple of . Thus, when the important effects and interactions to be estimated are identified beforehand, using an RFFD may lead to the use of more resources than the bare minimum needed for the estimation of the important effects and interactions. For instance, if the number of factors is and the only important effects are the main effects plus the mean then using an RFFD of resolution would require runs for the experiment. This would actually estimate the main effects plus the mean but also can provide estimates of two other parameters that are known to be negligible.
A Saturated Design (SD) could be used in case of scarce resources when it is clear to the investigator which parameters are important and non-negligible. However, it turns out that the identification of a SD is a challenging problem. Numerous papers available in the literature discuss how to construct SDs under certain conditions. See Hedayat and Pesotan, 1992, Hedayat and Pesotan, 2007. In addition, various computer algorithms have been developed to search for SDs in the TLFD set-up. Some of these are SPAN, DETMAX. See Hedayat and Zhu (2011). It is worth pointing out that when RFFD are used to estimate a certain vector parameter of interest, the estimator of each effect except the mean is a contrast in terms of the runs and it is clear to practitioners that each estimator measures an interaction or the change in the response variable due to the variation of some factor from low to high. The common practice available in the literature is to choose a SD for which the underlying design matrix is non-singular. The Ordinary Least Squares (OLS) method is then used to obtain the estimator of the vector parameter of interest.
The question we may ask is the following “ is the estimator [BLUE] of each parameter (except the common mean) in a SD model a contrast in terms of the runs?”. Well, if the design matrix is a Hadamard matrix then the answer is trivially ‘yes’ since the SD in that case can be seen as an RFFD. However, when the design matrix is not a Hadamard matrix, the estimator of the vector parameter is given by , where is the saturated design matrix. In practice, it is desirable for practitioners to have the estimator of each estimable effect as contrast in terms of the runs for the sake of interpretation. It is interesting to verify that it is indeed so even when the design is not based on a Hadamard matrix. This can be seen as follows.
Let be a non-singular matrix of order with its first column being the vector ( a vector of length , where all its entries are ). Let denote the column vector of length . Then, is a vector of ’s which means that . Hence, whenever is non-singular, which implies that . This is equivalent to the statement that the estimates of all model parameters [except the over-all mean] are linear observational contrasts, irrespective of the nature of elements of the matrix .
The rest of the paper is organized as follows. In Section 2, we develop a general theory for ‘deletion of exact number of runs’ so as to ensure the estimability of all non-negligible effects/interactions in a saturated -factorial experiment. This is done through identification of the runs to be deleted (admissible runs for deletion), for any given collection of non-negligible effects/interactions. We give some illustrative examples in the case of -factorial experiment. It turns out that the admissible runs for deletion may not be unique. Therefore, it is desirable to identify the admissible set for deletion that would ensure the optimality of the design matrix under the D-optimality criterion. Thus, in Section 3, we shed more light on understanding the algorithm developed in Section 2 and explain how it may be used to classify design matrices with respect to the absolute value of their determinants. Then, we take up some illustrative examples in the case of -factorial experiments.
Section snippets
General theory for identification of runs for deletion — retaining estimability of non-negligible effects/interactions in a saturated design
We begin by listing several properties of a Hadamard matrix of order .
Theorem 1 Let be a Hadamard matrix of order that is partitioned into block matrices as , where and are square matrices of order and respectively such that . Then we have the following results: . If is non-singular then is also non-singular and the inverse of is given by: .
Proof being a Hadamard matrix, . Therefore, we have , where
A general rule for identifying admissible set
Theorem 1 states that whenever a Hadamard matrix is partitioned into block matrices as in Section 2, then and the matrix is non-singular if and only if the matrix is non-singular. The take-away message here in terms of a saturated design is that there is a direct relationship between a saturated design matrix and the matrix . It means that when the experimenter is deciding which runs to keep in a saturated design he has two equivalent options he can choose from.
Concluding remarks
The construction of saturated designs for two level factorial experiments has gained substantial interest over a long period of time. Numerous papers have been written about the classification of saturated design matrices of fixed order via the spectrum of the determinant function. Thus, the spectra of the determinant function for -matrices of order are well-known in the literature for orders up to . The spectrum of order was due to Metropolis (1971). For and , the
Acknowledgments
This work is partially supported by the US National Science Foundation (NSF Grant 1809681).
This research work was carried out while Bikas Sinha was a visitor at UIC, Chicago.
References (9)
- et al.
The maximum determinant of 21 ×21 (+1, 1)-matrices and d-optimal designs
J. Statist. Plann. Inference
(1987) - et al.
Tools for constructing optimal two-level factorial designs for a linear model containing main effects and one two-factor interaction
J. Statist. Plann. Inference
(2007) Classification of small (0, 1) matrices
Linear Algebra Appl.
(2006)- et al.
Linear Algebra and Matrices
(2018)
Cited by (2)
Multi-variate factorisation of numerical simulations
2021, Geoscientific Model DevelopmentEvolution of 2<sup>K</sup> factorial design: Expansion and contraction of the experimental region with a focus on fuzzy levels
2021, International Journal of Industrial Engineering : Theory Applications and Practice