Approximate pairwise likelihood inference in SGLM models with skew normal latent variables

https://doi.org/10.1016/j.cam.2021.113692Get rights and content

Abstract

Spatial generalized linear mixed models are commonly employed for modeling discrete spatial responses that are acquired on a continuous area. A standard assumption in these models is that the latent variables are normally distributed, however skewed residuals appear in some spatial generalized linear mixed models. In this study, we consider a closed skew Gaussian random field for the spatial latent variables in the spatial generalized linear mixed models and present a new approximate pairwise likelihood approach to estimate parameters. In order to introduce a new algorithm to obtain the pairwise maximum likelihood estimates for the parameters, we use a linearization method in the composite marginal likelihood and EM algorithm. Also, techniques to calculate parameter estimates and spatial prediction in this class of skew models are proposed. The performance of the proposed model and method are illustrated through a simulation study, and applied the Tehran air quality index data set.

Introduction

Spatial generalized linear mixed (SGLM) models are commonly used to model non-Gaussian discrete spatial responses. In these models, the spatial dependence of the data can be considered as random effects or latent variables. The most common assumption for the latent variables is to use a multivariate normal distribution. Inference of the model parameters and spatial prediction has been studied intensely for the normal assumption, see [1], [2], [3], [4], [5] and [6]. The SGLM model has recently received a great deal of attention, Smith et al. [7] proposed Poisson cokriging as a SGLM model and extended this methodology to predict the latent variables. Walder and Hanks [8] studied this model by using the Laplace moving average random fields and Bayesian approach.

In most of these studies, the latent variables are modeled by the multivariate normal distribution. Hosseini et al. [9], Hosseini and Mohammadzadeh [10] and Hosseini and Karimi [11] showed an erroneous normal assumption has influence on the estimation of the model parameters and the accuracy of spatial prediction in SGLM models. They used more flexible distributions such as the multivariate skew normal (SN) and the closed skew normal (CSN). Vicente et al. [12] defined nonlinear regression models with the SN errors and discussed classical and Bayesian approaches for these models. Karimi and Mohammadzadeh [13] discussed spatial regression models with closed skew normal correlated errors and showed that this modeling increases the accuracy of the parameter estimates. Alodat and Shakhatreh [14] modified the non-linear regression with the SN error terms and showed that it has better performance than the model with the normal error terms.

Azzalini and Dalla-Valle [15] introduced the SN distribution as an extension of the Normal distribution; it is contained in the new family of skew-normal distributions presented in [16]. The CSN distribution is an extension to the SN distribution and normal distribution, [17], [18], [19]. SN and CSN are more flexible because they include the normal distribution as a special case with an extra parameter to regulate skewness. Kim and Mallick [20] suggested a model based on SN distribution to handle skewed spatial data. Karimi et al. [21], Karimi and Mohammadzadeh [22] and Alodat and Al-Rawwash [23] defined a new closed skew Gaussian random field (CSG) and Rimstad and Omre [24] introduced an extension of this CSG random field. They defined a new parameterization of the CSN distribution. Tagle et al. [25] presented a new spatio-temporal model for non-Gaussian data by using skew-t distribution and showed that the skew-t model is more accurate than the Gaussian-model.

In this paper, we use the CSG random field proposed in Rimstad and Omre [24] and introduce a new SGLM model with skew spatial latent variables. We consider the marginal composite likelihood method introduced by Varin et al. [4] to estimate the model parameters. Their method has a high computational capability compared to the original likelihood model, especially for high dimensions. Also, we obtain an approximate CSN distribution for the conditional distribution of the latent variables given the spatial responses using a linearization method. Finally, the main contribution of the current work is to present a new approximate pairwise likelihood inference method for the resulting new proposed SGLM model.

This paper is organized as follows: In Section 2 the closed skew normal are defined. The CSG random field and SGLMM with closed skew normal latent variables are introduced in Section 3. The proposed approximate pairwise likelihood method, as well as the conditional distribution at unsampled locations for prediction is described in Section 4. Section 5 presents a simulation study to implement and evaluate the proposed model. Section 6 shows an application to Tehran Air Quality Index (AQI) data during March 2019 to March 2020. Closing remarks are given in Section 7.

Section snippets

Closed skew normal distribution

The CSN distribution is a class of statistical distributions, which includes the SN and normal distributions as special cases. In order to the CSN distribution extends the SN distribution by allowing more flexibility on the skewness directions. The CSN distribution has some very desirable properties, similar to those of the normal distribution. For instance, the CSN distribution is closed under marginalization, conditioning, and linear transformations (full column or row rank), see [26]. A n

SGLMM with closed skew normal latent variables

The random field {Z(s);sDRd} is called a Gaussian random field if any subset of points in the field, Z(s1),,Z(sn) are jointly multivariate normal. Now, Let U(s)={(U1(s),U2(s)),sDRd} be the bivariate Gaussian random field and U2=[U2(s1),,U2(sq)] with finite and fixed q. Rimstad and Omre [24] defined the CSG random field as follows: {X(s)=[U1(s)|U20],sD},if X=(X(s1),,X(sn)),n=1,2,3, has a CSN distribution for all finite locations (s1,,sn). They concluded that X(s) is approximately

Approximate pairwise likelihood inference

The maximum likelihood estimator has the merits such as consistency, asymptotic normality and asymptotic efficiency. However, a full maximum likelihood is not always computationally feasible when dimension is high. Pairwise likelihood approach which is the product of likelihoods for pairs of observations offers a significant reduction of the computational cost related to the conventional likelihood, because this approach is used a limited set of double integrals instead of high-dimensional

Simulation study

In this section, a simulation study is carried out to evaluate the performance of the approximate pairwise likelihood method based on realizations from the CSG random field. In addition, we compare results of the APEM-CSN algorithm presented in this paper with the APEM-Normal algorithm described in [34]. We simulate n=600 random locations inside an irregular grid of Romania map. An isotropic exponential correlation structure for Cφ with the spatial range parameter φ is used as Eq. (8). We fix

Application

Air pollution is one of the important problems in big cities. Poor health status and the number of unhealthy days is one of the important factors in choosing the right place to live for many people, especially those who have respiratory and lung diseases. In this section, the number of unhealthy days in different locations of Tehran is studied based on information obtained from air pollution monitoring stations. Tehran city is the capital of Iran with a total area of 18,814 square kilometers

Conclusion

In this work, we used the approximate pairwise likelihood method for SGLM models with the skew latent variables. For the latent variables, we used a general structure of the CSN distribution that includes most skew-normal distributions. This family of distributions, due to its closed properties under linearization, marginalization, and conditioning, provided suitable conditions for obtaining conditional prediction distributions of the latent variables on the response variable. However, in some

References (41)

  • XuX. et al.

    On the robustness of maximum composite likelihood estimate

    J. Statist. Plann. Inference

    (2011)
  • MahmoudianB.

    On the existence of some skew-Gaussian random field models

    Statist. Probab. Lett.

    (2018)
  • BreslowN.E. et al.

    Approximate inference in generalized linear mixed models

    J. Amer. Statist. Assoc.

    (1993)
  • DiggleP. et al.

    Model-based geostatistic

    J. R. Stat. Soc. Ser. C. Appl. Stat.

    (1998)
  • ZhangH.

    On estimation and prediction for spatial generalized linear mixed models

    Biometrics

    (2002)
  • EidsvikJ. et al.

    Approximate Bayesian inference in spatial generalized linear mixed models

    Scand. J. Stat.

    (2009)
  • HosseiniF. et al.

    Bayesian prediction for spatial GLMM’s with Closed Skew Normal latent variables

    Aust. N. Z. J. Stat.

    (2012)
  • HosseiniF. et al.

    Approximate likelihood inference in spatial generalized linear mixed models with closed skew normal latent variables

    Comm. Statist. Simulation Comput.

    (2020)
  • VicenteG.C. et al.

    A nonlinear regression model with skew-normal errors

    Statist. Papers

    (2010)
  • KarimiO. et al.

    BayesIan spatial regression models with closed skew normal correlated errors and missing

    Statist. Papers

    (2012)
  • View full text