Skip to content
Publicly Available Published by De Gruyter June 21, 2019

Computationally Stable Estimation Procedure for the Multivariate Linear Mixed-Effect Model and Application to Malaria Public Health Problem

  • Eric Houngla Adjakossa EMAIL logo , Norbert Mahouton Hounkonnou and Grégory Nuel

Abstract

In this paper, we provide the ML (Maximum Likelihood) and the REML (REstricted ML) criteria for consistently estimating multivariate linear mixed-effects models with arbitrary correlation structure between the random effects across dimensions, but independent (and possibly heteroscedastic) residuals. By factorizing the random effects covariance matrix, we provide an explicit expression of the profiled deviance through a reparameterization of the model. This strategy can be viewed as the generalization of the estimation procedure used by Douglas Bates and his co-authors in the context of the fitting of one-dimensional linear mixed-effects models. Beside its robustness regarding the starting points, the approach enables a numerically consistent estimate of the random effects covariance matrix while classical alternatives such as the EM algorithm are usually non-consistent. In a simulation study, we compare the estimates obtained from the present method with the EM algorithm-based estimates. We finally apply the method to a study of an immune response to Malaria in Benin.

1 Introduction

The linear mixed-effects model [1, 1, 2, 3, 4, 5] has become a popular tool for analyzing univariate multilevel data which arise in many areas (biology, medicine, economy, etc.) thanks to its flexibility to model the correlation contained in these data, and the availability of reliable and efficient software packages for fitting it [6, 7, 8, 9]. Univariate multilevel data are referred to observations (or measurements) of a single variable of interest on several levels. For instance, schools grouped by village, and villages grouped at the city-level. In this context, multiple variables of interest measured at multiple levels characterize multivariate multilevel data. Examples include exam or test scores recorded for students across time and multi-dimensional subjects (e.g. maths, biology, French, etc.). In some situations where attention is focused on the analysis of multivariate multilevel data, some research questions can only be answered in a joint analysis of two or more variables, not separately [10]. A multivariate extension of a single response variable-based linear mixed-effects model is indeed having increasing popularity as a flexible tool for the analysis of multivariate multilevel data [11, 12, 13, 14].

For example, in the context of the malaria immune study, fourteen antibodies against malaria have been measured on children exposed to malaria-infected mosquitoes in certain Benin republic villages from West Africa. One of the objectives of the malaria study was to evaluate the effect of the malaria infection on the child’s immune against the disease. The antigens which characterize the child’s immune status interact together, and it makes more sense to evaluate the effect of the malaria infection on all these antigens, simultaneously. The only valid way of doing this evaluation is to analyze the characteristics of the joint distribution of these antigens, conditional on the malaria infection. In [15] the authors pointed out the drawback of performing a separate analysis instead. The malaria study is further described below and used as an illustration of the method at the end of the paper.

For the one-dimensional linear mixed-effects model, many methods for obtaining the estimates of the fixed and the random effects have been proposed in the literature. These methods include Henderson’s mixed model equations [16], approaches proposed by [17] as well as techniques based on two-stage regression. Searle et al. [18] and [19] gave more details of the latter. Concerning the variance parameters estimation in linear mixed-effects model, the discussed methods in the literature include the ANOVA method for balanced data which uses the expected mean squares approach [20, 21]. For unbalanced data, [22] proposed the minimum norm quadratic estimation (MINQUE) method, where the resulting estimates are translation invariant under unbiased quadratic forms of the observations. Lee and Nelder [23] suggested another method of estimating variance parameters using extended quasi-likelihood, meaning gamma-log generalized linear models. Gumedze and Dunne [24] presented more details on the parameters’ estimation methods in the linear mixed-effects model. The Maximum Likelihood (ML) and the Restricted Maximum Likelihood (REML) methods are the most popular estimation methods in the linear mixed-effects model [25]. The main attraction of these methods is the ability to handle a much wider class of variance models than simple variance components [24].

In the multivariate linear mixed-effects model, ML and REML estimates are frequently obtained through iterative schemes such as the EM algorithm [14, 26, 27, 28, 29]. This avoids the difficulties related to the direct calculation of the parameters’ likelihood, without increasing the implementation burden. Despite the existence of valid theorems proving the asymptotic convergence of the iterative sequences of these algorithms [27], the practical reality is often more complex. Verbeke et al. [10] explained that the main problem in fitting multivariate mixed-effects models is the loss of numerical stability due to the increasing of the dimension of the response variable. Indeed, the increase of the dimension of the response variable obviously leads to that of the dimension of the parameters vector of the model. Adjakossa et al. [15] illustrate through simulations studies that the random effects covariance matrix obtained by the EM algorithm has the worst Mean Square Error (MSE) among the parameters. In the same paper, the authors also gave a direct estimate of the random effects covariance matrix.

In this paper, we focus on the multivariate linear mixed-effects model with arbitrary correlation structure between the Gaussian random effects across dimensions, but independent (and possibly heteroscedastic) Gaussian residuals. In order to gain in numerical stability along with a better MSE for the variance parameter estimates, inspiring from [6] approach in the one-dimensional case, we optimize the profiled deviance which is obtained by partially maximizing the likelihood in the fixed effect parameters. This choice is justified by the fact that the fixed-effects vector is generally greater in dimension than the vector of variance parameters. Furthermore, avoiding a direct optimization on this vector of fixed-effects may decrease the complexity of the resulting optimization problem. Finally, the profiled deviance is a function of a small number of parameters that is easily optimized by any box-constrained optimizer, since some of the variance parameters are constrained. Our approach consists in directly calculating the likelihood of the model parameters after a reparametrization. This likelihood is used to obtain ML or REML estimates through the provided profiled deviance or the REML criterion, respectively. This strategy may explain the high quality of the estimates for the fixed effects parameters and the random effects variance parameters as well as the residual variance parameters. This approach may be viewed as the multi-dimensional generalization of the lme4 R package [6].

2 Multivariate linear mixed-effects model

Throughout the paper we adopt the following classical notations: || denotes the determinant of a matrix, is the Euclidean norm, and is the matrix transpose symbol.

For the sake of simplicity, we focus on the bivariate case (d = 2) in most of the paper, but the generalization to higher dimensions (d > 2) is straightforward. Thus, in dimension 2, the model is the following:

(1)y1=X1β1+Z1γ1+ε1y2=X2β2+Z2γ2+ε2

where

(2)γ=γ1γ2N0,Γ=Γ1Γ12Γ12Γ2,ε=ε1ε2N0,σ12IN00σ22IN.

For k ∈ {1,2}, βkRpk×1and γkRqk×1 denote respectively the fixed effect and the random effect vectors of covariates, while εkRN×1 is the marginal residual component in the dimension k of the model. Moreover, ykRN×1 is the response variable in dimension k, XkRN×pk is a matrix of covariates and ZkRN×pk the random-effect design matrix. We denote p=p1+p2 and q=q1+q2.

The variance-covariance matrices of γ1 and γ2 are respectively Γ1Rq1×q1 and Γ2Rq2×q2, and must be positive semidefinite. It is then convenient to express the model in terms of the relative covariance factors Λθ1Rq1×q1 and Λθ2Rq2×q2 such that

(3)Γ1=σ12Λθ1Λθ1andΓ2=σ22Λθ2Λθ2.

In eq. (3), σ12 and σ22 are the same marginal residual variances used in the model eq. (2). Using the variance-component parameters θ1 and θ2, the marginal random effects γ1 and γ2 are expressed as

γ1=Λθ1u1,γ2=Λθ2u2,

where

(4)u=u1u2N0,Σu,withΣu=σ12Iq1σ1σ2ρσ1σ2ρσ22Iq2,

and ρ=diag(ρ,,ρ). For example, if we suppose that we have one intercept (I) and one slope (S) in each dimension, ρ is expressed as

ρ=corr(γ1I,γ2I)corr(γ1I,γ2S)corr(γ1S,γ2I)corr(γ1S,γ2S),

where γkI and γkS are the random intercept and slope, respectively, in the kth dimension of the model. The bivariate linear mixed-effects model can therefore be expressed as

(5)y1=X1β1+Z1Λθ1u1+ε1y2=X2β2+Z2Λθ2u2+ε2

with

(6)u=u1u2N0,Σu,ε=ε1ε2N0,σ12IN00σ22IN,

where IN denotes the dimension N identity matrix. From now on, our objective is to estimate the parameter Θ=(β1,β2,σ12,σ22,θ1,θ2,ρ).

2.1 ML criterion

The objective of this section is to express the log-likelihood of the model defined in eqs. (5) and (6) given an observation y=(y1,y2).

If we denote by U the random variable of the random effects u, the conditional expectation μU|Y=y of U given that Y=y can be obtained jointly with the fixed effect estimate βˆθ,ρ,σ conditionally to θ=(θ1,θ2), ρ, and σ=(σ12,σ22) with the following lemma:

Lemma 2.1.

Denoting

Yσ=σ22y1,σ12y2,Xσ=σ22X100σ12X2,Zσθ=σ22Z1Λθ100σ12Z2Λθ2,

βˆθ,ρ,σ and μU|Y=y satisfy

(7)XσXσXσZσθZσθXσZσθZσθ+σ12σ22Σu1βˆθ,ρ,σμU|Y=y=XσZσθYσ.

By using Lemma 2.1 the log-likelihood of Θ given y is expressed in the following theorem.

Theorem 2.1.

Assume that βˆθ,ρ,σ and μU|Y=y satisfy eq. (7). By using the notations of Lemma 2.1, let us denote by r(βˆθ,ρ,σ,μU|Y=y) the penalized residual sum of squares (knowing θ, ρ and σ) such that

r(βˆθ,ρ,σ,μU|Y=y)=YσXσβˆθ,ρ,σZσθμU|Y=y2+σ12σ22μU|Y=yΣu1μU|Y=y,
and Lθ,ρ,σ and RX two matrices satisfying the following Cholesky decomposition, respectively:
Lθ,ρ,σLθ,ρ,σ=ZσθZσθ+σ12σ22Σu1,
and
XσXσXσZσθZσθXσLθ,ρ,σLθ,ρ,σ=RX0RZXLθ,ρ,σRX0RZXLθ,ρ,σ.
Then, the log-likelihood of Θ can be expressed as:
(8)l(β,θ,ρ,σ|y)=r(βˆθ,ρ,σ,μU|Y=y)+RX(ββˆθ,ρ,σ)22σ12σ22Nq2log(σ12σ22)12log(|Σu|)12log(|Lθ,ρ,σ|2).

The ML criterion expressed in eq. (8) is a differentiable convex function of β, and (β,θ,ρ,σ|y)/β=0 implies that β=βˆθ,ρ,σ. Then the ML estimate of Θ can be obtained through the following corollary.

Corollary 2.1.

Consider the log-likelihood defined in eq. (8) and let DML(θ,ρ,σ|y) be the ML variance parameters profiled deviance, given by

DML(θ,ρ,σ|y)=2βˆθ,ρ,σ,θ,ρ,σ|y=r(βˆθ,ρ,σ,μU|Y=y)σ12σ22+(Nq)log(σ12σ22)+log(|Σu|)+log(|Lθ,ρ,σ|2),
the ML estimators βˆ, θˆ, ρˆ, σˆ of β, θ, ρ and σ are obtained by
(9)θˆ,ρˆ,σˆ=argminθ,ρ,σDML(θ,ρ,σ|y)andβˆ=βˆθˆ,ρˆ,σˆ.

The procedure that we use to derive the ML estimate of Θ components is summarized in Corollary 2.1 through eq. (9). First, the ML estimate of the fixed-effects vector is analytically computed while supposing that all the variance parameters are known. This leads to βˆθ,ρ,σ which satisfies eq. (7). Then, plugging βˆθ,ρ,σ into the full likelihood yields the profiled deviance DML(θ,ρ,σ|y) which is a function of variance parameters only. The numerical task consisting in optimizing the deviance DML(θ,ρ,σ|y) is easier than optimizing the full likelihood (β,θ,ρ,σ|y). Finally, we get βˆ=βˆθˆ,ρˆ,σˆ once θˆ, ρˆ and σˆ are obtained.

2.2 REML criterion

The REML criterion [4] is obtained by integrating the likelihood of the model (given y) with respect to the fixed-effects β:

L(θ,ρ,σ|y)=exp(β,θ,ρ,σ|y)dβ.

In eq. (8), we recognize the expression of a multivariate Gaussian distribution density in β (up to a normalization factor), and we establish that:

Theorem 2.2.

Consider the log-likelihood defined in eq. (8). By using the notations of Theorem 2.1, the log-REML profiled deviance defined by DREML(θ,ρ,σ|y)=2Lθ,ρ,σ|y is given by

(10)DREML(θ,ρ,σ|y)=r(βˆθ,ρ,σ,μU|Y=y)σ12σ22+(Npq)log(σ12σ22)+log(|Σu|)+log(|Lθ,ρ,σ|2)+log(|RX|2),
and the REML estimators βˆR, θˆR, ρˆR, σˆR of β, θ, ρ and σ are characterized by
(11)θˆR,ρˆR,σˆR=arg minθ,ρ,σDREML(θ,ρ,σ|y)andβˆR=argmaxβ(β,θˆR,ρˆR,σˆR|y).

The ML and the REML deviance optimization tasks can be solved by using any box-constrained optimizer. In this paper, we use for example the optimization function nlminb available in the R software.

3 Simulation studies

In this section, the computational stability of the procedure is illustrated through simulation studies, and the present estimation procedure is compared with the estimation procedure based on the EM algorithm. For the sake of simplicity, the simulation studies are performed using simulated bivariate longitudinal data sets. In the following paragraph, we explain how we chose the parameters that have been used to simulate the “working" data sets.

The working data sets. We suppose that we are following up a sample of subjects where the goal is to evaluate how the increase of the weight and the height of the individuals of this population are jointly explained by the sex, the score of nutrition (Nscore), and the age. Each subject has been seen ni (i indicating the subject) times in the course of the survey where at each time his/her height, his/her weight and his/her nutrition score are measured. We randomly affect sex, age and nutrition score using uniform distributions: balanced sex ratio, U([18,37]) for the age, and U([20,50]) for the nutrition score. All the computations in this paper are done using the R software. The model under which we simulate the data for individual i = 1,…,n is the following:

(12)weighti=(1ni,sexi,Nscorei,agei)β1+(1ni,Nscorei)γ1i+ε1i,heighti=(1ni,sexi,Nscorei,agei)β2+(1ni,Nscorei)γ2i+ε2i,

where

(13)γi=γ1iγ2iN0,Γˉ,ε1iN0,σ12Ini,ε2iN0,σ22Ini,γiε1iε2i.

In eq. (12), weighti is a vector (of length ni) that contains the successive weights for the subject i. The random effect related to the dependent variable weight or height is a vector composed of one random intercept and one random slope in the direction of the covariate Nscore. The total number of observations is denoted by N (=n1+n2+). The parameters in the model described by eqs. (12) and (13) are β1, β2, Γˉ, σ1 and σ2.

The true values of β1,β2,σ1 and σ2 are reported in the first column of Table 2. The matrix Γˉ was chosen such that it is positive definite with the following form:

Γˉ=η12ρηη1η2ρη1τ1ρη1τ2ρηη1η2η22ρη2τ1ρη2τ2ρη1τ1ρη2τ1τ12ρττ1τ2ρη1τ2ρη2τ2ρττ1τ2τ22.

In order to have an almost strong correlation between the marginal random effects, we set ρ = 0.8 and randomly chose all other parameters involved in the expression of Γˉ. The matrix Γˉ that resulted from this procedure is

(14)Γˉ=27.7718.8041.704.9318.8036.0047.475.6241.7047.4797.818.914.935.628.911.37.

3.1 Estimates’ performances

One practical way to show the numerical consistency of an estimator is by computing its Mean Square Error (MSE). If the MSE of an estimator is asymptotically zero, this estimator converges in probability, and is then consistent. We simulated datasets with gradually increasing sample sizes where n ∈ {50, 60, 100, 300} and N ∈ {600, 1000, 3000}. In order to account for the unbalanced nature of the longitudinal data we observe in practice, we used a multinomial distribution of the N observations among the n individuals under the constraint that ni2 for all individual i (using the rejection method). For each combination of (n, N) we performed 100 replications, and derive empirical mean and 95 % CI (confidence interval) for the estimators’ MSE.

One should note that the chosen simulation design is challenging since it usually requires ni/n for reaching the asymptotic state [30]. Indeed, the condition ni/n implies that ni>n which is seldom satisfied in practical studies. In our simulations, we do not have nin in general but the parameters nevertheless are generally well estimated as shown in Table 1. This illustrates not only the good practical properties of the parameter estimators but also the improved computational stability of our procedure.

Furthermore, it requires not only n and N but also N/n for reaching the asymptotic state [30]. This means that it requires a sufficient total number of observations, a sufficient number of levels for the grouping factor and a sufficient number of observations for each level of the grouping factor. For example, in Table 1, when the total number of observations is N = 1000, the MSE of Γˉ is better for n = 100, 0.47 [0.03;1.29], than for n = 300, 0.69 [0.06;1.94]. Observing Table 1, it is clear that as the number of observations increases along with a reasonable number of subjects, the MSE tends to 0. The estimation procedure discussed in this paper may therefore be named Numerically Consistent Estimation (procedure) for the Multivariate Linear Mixed-Effects model (NCEmlme).

Table 1:

MSE in the simulation study. Mean Square Error of estimators with 95 % CI estimated on 100 replications for values of n ∈ {50, 60, 100, 300} and N ∈ {600, 1000, 3000}.

ParameternN = 600N = 1000N = 3000
β1502.43 (0.117.11)1.89 (0.224.89)1.02 (0.072.44)
602.57 (0.147.87)2.13 (0.265.55)0.77 (0.102.09)
1002.27 (0.165.61)1.55 (0.145.17)0.71 (0.041.85)
3003.16 (0.1410.54)1.70 (0.104.55)0.51 (0.041.36)
β2505.50(0.0916.35)3.26 (0.0211.64)2.06 (0.095.98)
605.06 (0.1215.09)3.22 (0.0210.24)1.78 (0.105.62)
1004.33 (0.0313.17)2.37 (0.026.89)1.06 (0.023.60)
3004.58 (0.1815.92)2.43 (0.057.39)0.90 (0.042.88)
σ1500.03 (0.000.14)0.02 (0.000.09)0.00 (0.000.02)
600.04 (0.000.11)0.02 (0.000.08)0.00 (0.000.02)
1000.03 (0.000.13)0.01 (0.000.06)0.00 (0.000.01)
3000.05 (0.000.23)0.03 (0.000.13)0.00 (0.000.02)
σ2500.04 (0.000.15)0.03 (0.000.08)0.00 (0.000.02)
600.04 (0.000.18)0.03 (0.000.12)0.01 (0.000.03)
1000.06 (0.000.18)0.03 (0.000.11)0.01 (0.000.03)
3000.13 (0.000.45)0.04 (0.000.15)0.01 (0.000.05)
Γˉ500.90 (0.122.41)0.62 (0.061.41)0.45 (0.041.14)
601.07 (0.062.64)0.68 (0.081.98)0.25 (0.030.66)
1000.90 (0.052.40)0.47 (0.031.29)0.21 (0.020.71)
3001.45 (0.194.39)0.69 (0.061.94)0.17 (0.020.57)

3.2 Comparison with EM-based estimates

In this section, we compare the estimation procedure based on the EM algorithm with NCEmlme. For both algorithms, we consider two initialization schemes: naive or advised. For the naive initialization the starting parameter is randomly chosen (without specific control), and for the advised one the starting values are those obtained by fitting separately each dimension of the bivariate model. For each replication, the starting values are the same for both the NCEmlme and EM-based algorithms. For each algorithm, the number of iterations required for convergence are reported too. Our methodology consists in simulating one hundred longitudinal datasets of size (N = 3000, n = 300) and fitting the model to each of these datasets using the EM-based algorithm and the NCEmlme, respectively. This allows to compute both the 95 % CI and the empirical mean of the one hundred estimates in each case (naive and advised starting values). The results are presented in Table 2 and Table 3.

Table 2 contains the empirical means of the estimates with their 95 % CI, and the minimum, the maximum and the average numbers of iterations. Table 3 contains the empirical relative error of the estimators with their 95 % CI. The results show that in the case of naive initialization, the NCEmlme estimators outperform the EM estimators. For example, the component of β1 which is 14.00 is well estimated by NCEmlme, 14.02 [13.2714.45] with an empirical relative error of 0.02 [0.00;0.04], but poorly estimated by EM, 2.05 [4.70;0.40] with an empirical relative error of 1.14 [1.01;1.32]. In the case of the advised initialization, both NCEmlme and EM algorithms perform well, but NCEmlme converge faster (64 iterations in average) than EM (169 iterations in average). The number of iterations required by NCEmlme with advised initialization ranges from 48 to 89 compared to 56 to 103 for the naive initialization. The NCEmlme with naive initialization therefore needs more iterations than NCEmlme with advised initialization to converge. This is expected and may be explained by the fact that the advised initialization values contain some information from the data of interest, and the naive starting points do not.

Table 2:

Comparison of EM-based and NCEmlme estimates in the simulation study. EM-based estimates compared with NCEmlme estimates on the same data sets. Empirical estimates with their 95 % CI and the number of iterations required for convergence.

Naive initializationAdvised initialization
NCEmlmeEMNCEmlmeEM
ParameterValueEmp. Mean95 % CIEmp. Mean95 % CIEmp. Mean95 % CIEmp. Mean95 % CI
β150.6750.7949.1452.1113.4776.7043.3750.8049.1552.1250.7849.0952.01
–4.805.008.393.66–4.798.083.42–5.028.393.66–4.988.383.65
14.0014.0213.2714.45–2.054.700.4014.0213.2814.4514.0213.2714.45
2.702.702.662.722.692.662.722.702.662.722.702.662.72
β213.2013.6511.7915.06–84.47114.2850.6313.6511.7915.0713.6811.8115.14
–2.80–2.804.740.43–2.754.900.21–2.814.790.43–2.854.800.51
27.0027.0026.8727.100.901.622.6827.0026.8727.1027.0026.8727.10
1.701.681.641.711.681.641.711.681.641.711.681.641.71
σ15.805.795.625.925.785.645.985.785.645.925.795.655.94
σ27.607.617.347.747.597.337.737.617.347.737.637.367.73
Nbr. of iterationMin56634814
Mean7110964169
Max10315789645
Table 3:

Comparison of EM-based and NCEmlme relative errors on the estimates in the simulation study. EM-based estimates compared with NCEmlme estimates on the same data sets. Empirical relative error of the estimates with their 95 % CI.

Naive initializationAdvised initialization
NCEmlmeEMNCEmlmeEM
ParameterValueR. Error95 % CIR. Error95 % CIR. Error95 % CIR. Error95 % CI
β150.670.010.000.030.730.012.180.010.000.030.010.000.03
4.800.210.020.320.210.000.320.210.020.320.210.000.33
14.000.020.000.041.141.011.320.020.000.040.020.000.04
2.700.000.000.010.000.000.010.000.000.010.000.000.01
β213.200.070.000.147.394.199.400.070.000.140.070.000.14
2.800.430.000.840.430.021.070.430.000.840.430.000.81
27.000.000.000.000.960.881.020.000.000.000.000.000.00
1.700.010.000.020.010.000.020.010.000.020.010.000.03
σ15.800.010.000.020.010.000.030.010.000.020.010.000.03
σ27.600.010.000.020.010.000.040.010.000.020.010.000.02

The empirical mean of the random effects covariance matrix, Γˉ, is well estimated with advised initializations:

ΓˉadvNCEmlme=25.7416.3435.754.5016.3434.8343.975.3735.7543.9782.448.434.505.378.431.32,withσΓadvNCEmlme=5.732.744.450.612.742.593.960.444.453.9612.070.800.610.440.800.11,

and

ΓˉadvEM=23.6616.4532.604.4516.4534.8243.395.3932.6143.3975.168.664.455.398.651.31,withσΓadvEM=7.492.715.290.762.712.593.710.455.303.7113.360.800.760.450.800.11.

The matrices σΓadvNCEmlme and σΓadvEM contain the standard deviations of the entries of ΓˉadvNCEmlme and ΓˉadvEM, respectively. It seems that the empirical standard deviation of the entries of Γˉ with the greater magnitude are bigger with EM than with NCEmlme. For example, the standard deviation of Γˉ11=27.77 is 7.49 for EM, but 5.73 for NCEmlme. The same remark is valid for the standard deviations of Γˉ31, Γˉ32 and Γˉ33, comparing EM and NCEmlme. This may be explained by the fact that the NCEmlme procedure is more numerically stable than the EM procedure. In the case of naive initializations, NCEmlme provides a well estimated empirical mean of Γˉ, when the estimated Γˉ provided by EM is very bad (we choose not to show it here).

ΓˉnaiveNCEmlme=24.7616.1834.434.4716.1834.7743.915.3534.4343.9180.678.404.475.358.401.32,withσΓnaiveNCEmlme=7.582.676.760.612.672.543.910.426.763.9113.370.800.610.420.800.11.
ΓˉnaiveNCEmlme compared to ΓˉadvNCEmlme and σΓnaiveNCEmlme compared to σΓadvNCEmlme show a slight difference which reveals a tiny sensibility of NCEmlme to the starting values. This may be corrected by doing more than one evaluation of the model’s deviance.

For all the simulation studies, we use the ML deviance criterion and have minimized it using the nlminb function under the R software. Thus, the estimates obtained are from the ML estimators. In this paper, we do not provide an application of REML estimators.

4 Application on malaria dataset

4.1 Data description

The data that we analyze here come from a study which was conducted in nine villages (Avamé centre, Gbédjougo, Houngo, Anavié, Dohinoko, Gbétaga, Tori Cada Centre, Zébè and Zoungoudo) of Tori Bossito area (Southern Benin), where P.falciparum is the commonest species in the study area (95 %) [31] from June 2007 to January 2010. The aim of this study was to evaluate the determinants of malaria incidence in the first months of life of children in Benin.

Mothers (n = 620) were enrolled at delivery and their newborns were actively followed up during the first year of life. One questionnaire was conducted to gather information on women’s characteristics (age, parity, use of Intermittent Preventive Treatment during pregnancy (IPTp) and bed net possession) and during their pregnancy. After delivery, thick and thin placental blood smears were examined to detect placental infection defined by the presence of asexual forms of P.falciparum. Maternal peripheral blood as well as cord blood were collected. At birth, newborn’s weight and length were measured and gestational age was estimated.

During the follow-up of newborns, axillary temperature was measured weekly. In case of temperature higher than 37.5 C, mothers were told to bring their children to the health center where a questionnaire was filled out. A rapid diagnostic test (RDT) for malaria was performed and a thick blood smear (TBS) made. Symptomatic malaria cases, defined as fever (>37.5 C) with TBS and/or RDT positive, were treated with an artemisinin-based combination. Systematically, TBS were made every month to detect asymptomatic infections. Every three months, venous blood was sampled to quantify the level of antibody against malaria promised candidate vaccine antigens. Finally, the environmental risk of exposure to malaria was modeled for each child, derived from a statistical predictive model based on climatic, entomological parameters, and characteristics of children’s immediate surroundings. Also every 3 months (at 3, 6, 9, 12, 15, 18 months 130 of age), infant blood samples were collected.

Concerning the antibody quantification, two recombinant P.falciparum antigens where used to perform IgG subclass (IgG1 and IgG3) antibody. Recombinants antigens MSP2 (3D7 and FC27) were from La Trobe University [32, 33]. GLURP-R0 (amino acids 25-514, F32 strain) and GLURP-R2 (amino acids 706-1178, 140 F32 strain) were also expressed. The antibodies were quantified in plasma at different times and ADAMSEL FLPb039 software (http://www.malariaresearch.eu/content/software) was used to analyze automatically the ELISA optical density (OD) leading to antibody concentrations in (µg/mL).

In this paper, we use some of the data and we rename the proteins used in the study, for reasons of the protection of these data. We thus name A1, A2, B and C the proteins that we consider, which are related to the antigens IgG1 and IgG3 as mentioned above. The information contained in the multivariate longitudinal dataset of malaria is described in Table 4, where Y denotes an antigen which is one of the following:

(15)IgG1_A1,IgG3_A1,IgG1_A2,IgG3_A2,IgG1_B,IgG3_B,IgG1_C,IgG3_C
Table 4:

Variables of the Malaria dataset.

NVariableDescription
1idChild ID
2conc.Yconcentration of Y
3conc_CO.YMeasured concentration of Y in the umbilical cord blood
4conc_M3.YPredicted concentration of Y in the child’s peripheral blood at 3 months
5apPlacental apposition
6hbHemoglobin level
7inf_trimNumber of malaria infections in the previous 3 months
8pred_trimQuarterly average number of mosquitoes child is exposed to
9nutri_trimQuarterly average nutrition scores

4.2 Data analysis

The aim of the data analysis is to evaluate the effect of the malaria infection on the child’s immune acquisition (against malaria). Since the antigens which characterize the child’s immune status interact together in the human body, we analyze the characteristics of the joint distribution of these antigens, conditionally on the malaria infection and other factors of interest. The dependent variables are then provided by conc.Y (Table 4) which describes the level of the antigen Y in the children at 3, 6, 9, 12, 15 and 18 months. All other variables in Table 4 are covariates. We then have d = 8 dependent variables which describe the longitudinal profile (in the child) of the proteins listed in eq. (15).

To illustrate the stability of our approach, we are fitting here a bivariate model (see eqs. (16) and (17)) to the data, with IgG1_A1 and IgG1_A2 as dependent variables:

(16)conc.IgG1_A1=(1,ap,conc_CO.IgG1_A1,conc_M3.IgG1_A1,hb,inf_trim,pred_trim,nutri_trim)β1+(1,inf_trim)γ1+ε1conc.IgG1_A2=(1,ap,conc_CO.IgG1_A2,conc_M3.IgG1_A2,hb,inf_trim,pred_trim,nutri_trim)β2+(1,inf_trim)γ2+ε2,

with

(17)γ=(γ1,γ2)N0Γˉ,ε=(ε1,ε2)N0,σ12I00σ22I.

Our strategy is to (1) fit the model to the data by running the NCEmlme algorithm using 25 different naive starting points and (2) retain the estimates related to the best likelihood (the minimum of the 25 deviances) as the true parameters and compute the estimators’ MSE using the 24 others estimates. This may allow to evaluate how much the NCEmlme algorithm is sensitive to the starting points. The results are contained in Table 5. Based on these results, the influence of the starting points on the NCEmlme algorithm is very low (see the MSE in Table 5). The estimated random effects covariance matrix is

(18)Γ=1.270.531.570.710.530.490.710.651.570.711.930.820.710.650.820.77

with an MSE of 0.0095.

Table 5:

Bivariate and univariate linear mixed-effects model fitted to the malaria proteins. The parameters are estimated under the bivariate and the univariate models, respectively. The MSE are obtained for the bivariate model using 25 different naive starting points.

Response variables
conc.IgG1_A1conc.IgG1_A2
CovariatesJoint estim.MSEIndep. estim.Joint estim.MSEIndep. estim.
Intercept0.6489.05×1050.4510.0323.18×1051.794
ap0.1301.06×1050.1100.0381.04×1060.324
conc_CO.IgG1_A10.1271.68×1060.174
conc_M3.IgG1_A10.1519.85×1060.130
conc_CO.IgG1_A20.0926.44×1070.044
conc_M3.IgG1_A20.1172.22×1070.137
hb0.1663.22×1070.1540.2451.35×1070.335
inf_trim0.3981.89×1060.3730.7855.09×1070.709
pred_trim0.0025.25×1080.0010.0171.49×1080.019
nutri_trim0.0725.81×1060.0460.2713.71×1050.106
σ1 and σ21.3454.96×1061.4261.4952.42×1051.672

In this kind of study, it is often useful to characterize the subjects (here, the children) by constructing their profiles. This is done through the computation of the expected random-effects vector (i.e. E[γ|y]). We compute here the expected random-effects vector for each child under the joint model but also under the separate models (i.e. two independent models since the joint model is bivariate). The results are shown in Figure 1 and Figure 2. Children profiles change dramatically depending on whether one is in the joint model or in the independent models. This change is noticeable in the expected random intercepts but also in the expected random slopes. The discrepancy between the parameters estimated under the joint model and the separate models (see the columns labeled Indep. estim. and Joint estim., respectively in Table 5) could be explained by the change noticed in the children profiles. This discrepancy is expected since the correlations between the dimensional random-effects are not null (see eq. (18)), and should be taken into account when characterizing the children profiles. These differences in the children profiles are crucial when analyzing such a highly important health dataset.

Figure 1: Expected values of the random intercept/slope (in the malaria application) from the first dimension of the bivariate model. The red profiles are computed with the expected random intercepts and random slopes, using a univariate model that is equal to the first dimension (related to the dependent variable IgG1_A1$\text{IgG1}\_\text{A1}$ and fitted separately) of the model expressed by eqs. (16) and (17). The blue profiles correspond to the random effects computed using the joint bivariate model.
Figure 1:

Expected values of the random intercept/slope (in the malaria application) from the first dimension of the bivariate model. The red profiles are computed with the expected random intercepts and random slopes, using a univariate model that is equal to the first dimension (related to the dependent variable IgG1_A1 and fitted separately) of the model expressed by eqs. (16) and (17). The blue profiles correspond to the random effects computed using the joint bivariate model.

Figure 2: Expected values of the random intercept/slope (in the malaria application) from the second dimension of the bivariate model. The red profiles are computed with the expected random intercepts and random slopes, using a univariate model that is equal to the second dimension (related to the dependent variable IgG1_A2$\text{IgG1}\_\text{A2}$ and fitted separately) of the model expressed by eqs. (16) and (17). The blue profiles correspond to the random effects computed using the joint bivariate model.
Figure 2:

Expected values of the random intercept/slope (in the malaria application) from the second dimension of the bivariate model. The red profiles are computed with the expected random intercepts and random slopes, using a univariate model that is equal to the second dimension (related to the dependent variable IgG1_A2 and fitted separately) of the model expressed by eqs. (16) and (17). The blue profiles correspond to the random effects computed using the joint bivariate model.

5 Conclusion

In the context of multivariate linear mixed-effects models having homoscedastic dimensional residuals, we have suggested ML and REML estimation strategies by profiling the model’s deviance and Cholesky factorizing of the random effect covariance matrix. This approach can be considered as the generalization of the approach used by [6] in the R software lme4 package. Through extensive simulation studies, we have illustrated that the present approach outperforms the traditional EM-based estimates and provides estimates that are numerically consistent for both fixed effects and variance components. Another interesting characteristic is its robustness relative to the initial value of the optimization procedure which can be randomly chosen without affecting the estimation results. These characteristics of our procedure show their improved computational stability in comparison with the traditional approaches. Furthermore, the optimization of the profiled ML or REML criterion can be easily and rapidly performed using any existing box-constrained optimizer in the R software. Further considerations of this approach may include heteroscedastic residuals as well as residuals correlated with the random effects, where the theoretical consistency of the resulting estimators should be demonstrated.

Appendix

A Proof of Theorem 2.1 and Lemma 2.1

Proof.

Denoting by fX(.) the density function of any random vector X, the density function of Y is

fY(y)=Rq1+q2fY,U(y,u)du,

where fY,U(y,u) is the joint density function of Y and U. The dependence between the coordinates of Y comes from U, and by conditioning by U, the density function fY|U(y|u) can be factorized as: fY|U(y|u)=fY1|U1(y1|u1)fY2|U2(y2|u2). Thus,

fY,U(y,u)=fY1|U1(y1|u1)fY2|U2(y2|u2)fU(u)=(2πσ12)N2(2πσ22)N2(2π)q1+q22|Σu|12expy1X1β1Z1Λθ1u122σ12y2X2β2Z2Λθ2u222σ2212uΣu1u.

Let us denote by Σ˜ a matrix such that Σu1=Σ˜Σ˜. Therefore, uΣu1u=Σ˜u2 and

(19)y1X1β1Z1Λθ1u12σ12+y2X2β2Z2Λθ2u22σ22+uΣu1u=σ22(y1X1β1Z1Λθ1u1)2+σ12(y2X2β2Z2Λθ2u2)2+σ12σ22Σ˜u2σ12σ22=(1/(σ12σ22))σ22y1σ12y20q1+q2σ22X10N,p2σ22Z1Λθ10N,q20N,p1σ12X20N,q1σ12Z2Λθ20q1+q2,p1+p2σ12σ22Σ˜βu2=(1/(σ12σ22))YΛZXΛβu2
(20)=g(β,u,θ,ρ,σ)/(σ12σ22),

where 0q1+q2 is a zero vector of Rq1+q2, and 0N,p1, 0N,p2, 0N,q1, 0N,q2 and 0q1+q2,p1+p2 are zero matrices of dimension N×p1, N×p1, N×q1, N×q2 and q1+q2×p1+p2, respectively. In eq. (19),

YΛ=σ22y1σ12y20q1+q2,

and

ZXΛ=σ22X10N,p2σ22Z1Λθ10N,q20N,p1σ12X20N,q1σ12Z2Λθ20q1+q2,p1+p2σ12σ22Σ˜.

And in eq. (20), g(β,u,θ,ρ,σ)=YΛZXΛβu2 is a squared norm or, more specifically, a residual sum-of-squares. This is a standard least squares problem:

βˆθ,ρ,σμU|Y=y=argminu,βg(β,u,θ,ρ,σ)

for which the solution satisfy the normal equation:

(21)ZXΛZXΛβˆθ,ρ,σμU|Y=y=ZXΛYΛ,

where

ZXΛZXΛ=XσXσXσZσθZσθXσZσθZσθ+σ12σ22Σu1and ZXΛYΛ=XσZσYσ.

This concludes the proof of Lemma 2.1. In eq. (21), we behave as if θ, ρ and σ are known, and this justify the notation βˆθ,ρ,σ which means that the resulting estimator is a function of θ, ρ and σ.

By setting p=p1+p2, dim(ZXΛ)=(2N+q)×(p+q) and S=Im(ZXΛ) is a subspace of R2N+q. YΛR2N+q and ZXΛβˆθ,ρ,σμU|Y=y is the orthogonal projection of YΛ on S. Then,

ZXΛuYΛZXΛβˆθ,ρ,σμU|Y=y,uRp+q.

And g(β,u,θ,ρ,σ) can then be rewritten as:

g(β,u,θ,ρ,σ)=YΛZXΛβu+ZXΛβˆθ,ρ,σμU|Y=yZXΛβˆθ,ρ,σμU|Y=y2=YΛZXΛβˆθ,ρ,σμU|Y=y2+ZXΛββˆθ,ρ,σuμU|Y=y2=YΛZXΛβˆθ,ρ,σμU|Y=y2+ββˆθ,ρ,σuμU|Y=yZXΛZXΛββˆθ,ρ,σuμU|Y=y.

ZXΛZXΛ can be Cholesky decomposed as

ZXΛZXΛ=RX0RZXLθ,ρ,σRX0RZXLθ,ρ,σ,

where

Lθ,ρ,σLθ,ρ,σ=ZσθZσθ+σ12σ22Σu1.

Thereafter,

g(β,u,θ,ρ,σ)=YσXσβˆθ,ρ,σZσθμU|Y=y2+σ12σ22μU|Y=yΣu1μU|Y=y+RX(ββˆθ,ρ,σ)2+RZX(ββˆθ,ρ,σ)+Lθ,ρ,σ(uμU|Y=y)2.

By setting

r(βˆθ,ρ,σ,μU|Y=y)=YσXσβˆθ,ρ,σZσθμU|Y=y2+σ12σ22μU|Y=yΣu1μU|Y=y,

and returning to the calculation of fY(y), we get

fY(y)=expr(βˆθ,ρ,σ,μU|Y=y)+RX(ββˆθ,ρ,σ)2+RZX(ββˆθ,ρ,σ)+Lθ,ρ,σ(uμU|Y=y)22σ12σ22du(2πσ12)N/2(2πσ22)N/2(2π)q/2|Σu|1/2=expr(βˆθ,ρ,σ,μU|Y=y)+RX(ββˆθ,ρ,σ)22σ12σ22(2πσ12)N/2(2πσ22)N/2(2π)q/2|Σu|1/2×expRZX(ββˆθ,ρ,σ)+Lθ,ρ,σ(uμU|Y=y)22σ12σ22du.

By setting v=RZX(ββˆθ,ρ,σ)+Lθ,ρ,σ(uμU|Y=y), du=1|Lθ,ρ,σ|dv and

(22)fY(y)=expr(βˆθ,ρ,σ,μU|Y=y)+RX(ββˆθ,ρ,σ)22σ12σ22(σ12σ22)q2(2πσ12)N/2(2πσ22)N/2|Σu|1/2|Lθ,ρ,σ|1(2πσ12σ22)q2expv22σ12σ22dv=expr(βˆθ,ρ,σ,μU|Y=y)+RX(ββˆθ,ρ,σ)22σ12σ22(σ12σ22)q2(2πσ12)N/2(2πσ22)N/2|Σu|1/2|Lθ,ρ,σ|.

The log-likelihood to be maximized can therefore be expressed as,

(β,θ,ρ,σ|y)=r(βˆθ,ρ,σ,μU|Y=y)+RX(ββˆθ,ρ,σ)22σ12σ22Nq2log(σ12σ22)12log(|Σu|)12log(|Lθ,ρ,σ|2).

B Proof of Theorem 2.2

Proof.

Using the expression of fY(y) in eq. (22), we get

L(σ,θ,ρ|y)=RpfY(y)dβ=expr(βˆθ,ρ,σ,μU|Y=y)2σ12σ22(σ12σ22)q2(2πσ12)N/2(2πσ22)N/2|Σu|1/2|Lθ,ρ,σ|RpexpRX(ββˆθ,ρ,σ)22σ12σ22dβ=expr(βˆθ,ρ,σ,μU|Y=y)2σ12σ22(σ12σ22)p+qN2(2π)(2Np)/2|Σu|1/2|Lθ,ρ,σ||RX|Rp(2πσ12σ22)p2expt22σ12σ22dt,

where t=ββˆθ,ρ,σdβ=1|RX|dt and Rp(2πσ12σ22)p2expt22σ12σ22dt=1.   □

References

[1] Fitzmaurice GM, Laird NM, Ware JH. Applied longitudinal analysis, volume 998. New Jersey: John Wiley & Sons 201210.1002/9781119513469Search in Google Scholar

[2] Hartley HO, Rao JN. Maximum-likelihood estimation for the mixed analysis of variance model. Biometrika. 1967;54:93–108.10.1093/biomet/54.1-2.93Search in Google Scholar

[3] Hedeker D, Gibbons RD. Longitudinal data analysis, volume 451. John Wiley & Sons 2006Search in Google Scholar

[4] Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–97410.2307/2529876Search in Google Scholar

[5] Verbeke G. Linear mixed models for longitudinal data. In: Linear mixed models in practice. New York: Springer. 1997;126:63–15310.1007/978-1-4612-2294-1_3Search in Google Scholar

[6] Bates D, Maechler M, Bolker B, Walker S, et al. lme4: Linear mixed-effects models using eigen and s4. R Package Version.2014;7:1.Search in Google Scholar

[7] Halekoh U, Højsgaard S, Yan J. The r package geepack for generalized estimating equations. J Stat Software. 2006;15:1–11.10.18637/jss.v015.i02Search in Google Scholar

[8] Littell R, Milliken G, Stroup W, Wolfinger R, Schabenberger O. Random coefficient models. SAS System for Mixed Models. Cary, NC: SAS Institute Inc, 1996:253–66.Search in Google Scholar

[9] Pinheiro J, Bates D, DebRoy S, Sarkar D, et al. Linear and nonlinear mixed effects models. R Package Version. 2007;3:57.Search in Google Scholar

[10] Verbeke G, Fieuws S, Molenberghs G, Davidian M. The analysis of multivariate longitudinal data: a review. Stat Methods Med Res. 2014;23:42–59.10.1177/0962280212445834Search in Google Scholar

[11] Jensen KL, Spiild H, Toftum J. Implementation of multivariate linear mixed-effects models in the analysis of indoor climate performance experiments. Int J Biometeorol. 2012;56:129–36.10.1007/s00484-011-0404-ySearch in Google Scholar

[12] Sammel M, Lin X, Ryan L. Multivariate linear mixed models for multiple outcomes. Stat Med. 1999;18:2479–92.10.1002/(SICI)1097-0258(19990915/30)18:17/18<2479::AID-SIM270>3.0.CO;2-FSearch in Google Scholar

[13] Wang W-L, Fan T-H. Ecm-based maximum likelihood inference for multivariate linear mixed models with autoregressive errors. Comput Stat Data Anal 2010;54:1328–41.10.1016/j.csda.2009.11.021Search in Google Scholar

[14] Schafer JL, Yucel RM. Computational strategies for multivariate linear mixed-effects models with missing values. J Comput Graphical Stat. 2002;11:437–57.10.1198/106186002760180608Search in Google Scholar

[15] Adjakossa EH, Sadissou I, Hounkonnou MN, Nuel G. Multivariate longitudinal analysis with bivariate correlation test. PloS one. 2016;11:e0159649.10.1371/journal.pone.0159649Search in Google Scholar

[16] Henderson CR. Estimation of genetic parameters. Biometrics. 1950;6:186–7.Search in Google Scholar

[17] Goldberger AS. Best linear unbiased prediction in the generalized linear regression model. J Am Stat Assoc. 1962;57:369–75.10.1080/01621459.1962.10480665Search in Google Scholar

[18] Searle S, Casella G, McCulloch C. Variance components. New York, NY, USA: John Wiley & Sons, 1992.10.1002/9780470316856Search in Google Scholar

[19] Robinson GK. That blup is a good thing: the estimation of random effects. Stat Sci. 1991;6:15–32.Search in Google Scholar

[20] Searle SR. Linear models. New York: Wiley & Sons, 1971.Search in Google Scholar

[21] Searle SR. An overview of variance component estimation. Metrika. 1995;42:215–30.10.1007/BF01894301Search in Google Scholar

[22] Rao CR. Estimation of variance and covariance components minque theory. J Multivariate Anal. 1971;1:257–75.10.1016/0047-259X(71)90001-7Search in Google Scholar

[23] Lee Y, Nelder J. Generalized linear models for the analysis of quality-improvement experiments. Can J Stat. 1998;26:95–105.10.2307/3315676Search in Google Scholar

[24] Gumedze F, Dunne T. Parameter estimation and inference in the linear mixed model. Linear Algebra Appl. 2011;435:1920–44.10.1016/j.laa.2011.04.015Search in Google Scholar

[25] Lindstrom MJ, Bates DM. Newton raphson and em algorithms for linear mixed-effects models for repeated-measures data. J Am Stat Assoc. 1988;83:1014–22.10.1080/01621459.1988.10478693Search in Google Scholar

[26] An X, Yang Q, Bentler PM. A latent factor linear mixed model for high-dimensional longitudinal data analysis. Stat Med. 2013;32:4229–39.10.1002/sim.5825Search in Google Scholar PubMed PubMed Central

[27] Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc: B (methodological). 1977;39:1–38.Search in Google Scholar

[28] Meng X-L, Rubin DB. Maximum likelihood estimation via the ecm algorithm: a general framework. Biometrika. 1993;80:267–78.10.1093/biomet/80.2.267Search in Google Scholar

[29] Shah A, Laird N, Schoenfeld D. A random-effects model for multiple characteristics with possibly missing data. J Am Stat Assoc. 1997;92:775–9.10.1080/01621459.1997.10474030Search in Google Scholar

[30] Nie L. Convergence rate of mle in generalized linear and nonlinear mixed-effects models: theory and applications. J Stat Plann Inference. 2007;137:1787–804.10.1016/j.jspi.2005.06.010Search in Google Scholar

[31] Djènontin A, Bio-Bangana S, Moiroux N, Henry M-C, Bousari O, Chabi J, et al. Culicidae diversity, malaria transmission and insecticide resistance alleles in malaria vectors in ouidah-kpomasse-tori district from benin (west africa): a pre-intervention study. Parasit Vectors. 2010;3:83.10.1186/1756-3305-3-83Search in Google Scholar PubMed PubMed Central

[32] Anders RF, Adda CG, Foley M, Norton RS. Recombinant protein vaccines against the asexual blood-stages of plasmodium falciparum. Human Vaccines. 2010;6:39–53.10.4161/hv.6.1.10712Search in Google Scholar PubMed

[33] McCarthy JS, Marjason J, Elliott S, Fahey P, Bang G, Malkin E, et al. A phase 1 trial of msp2-c1, a blood-stage malaria vaccine containing 2 isoforms of msp2 formulated with montanide (r) isa 720. PLoS One. 2011;6:e24413.10.1371/journal.pone.0024413Search in Google Scholar PubMed PubMed Central

Received: 2017-10-03
Revised: 2019-02-07
Accepted: 2019-05-05
Published Online: 2019-06-21

© 2019 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 28.3.2024 from https://www.degruyter.com/document/doi/10.1515/ijb-2017-0076/html
Scroll to top button