Approximate pairwise likelihood inference in SGLM models with skew normal latent variables
Introduction
Spatial generalized linear mixed (SGLM) models are commonly used to model non-Gaussian discrete spatial responses. In these models, the spatial dependence of the data can be considered as random effects or latent variables. The most common assumption for the latent variables is to use a multivariate normal distribution. Inference of the model parameters and spatial prediction has been studied intensely for the normal assumption, see [1], [2], [3], [4], [5] and [6]. The SGLM model has recently received a great deal of attention, Smith et al. [7] proposed Poisson cokriging as a SGLM model and extended this methodology to predict the latent variables. Walder and Hanks [8] studied this model by using the Laplace moving average random fields and Bayesian approach.
In most of these studies, the latent variables are modeled by the multivariate normal distribution. Hosseini et al. [9], Hosseini and Mohammadzadeh [10] and Hosseini and Karimi [11] showed an erroneous normal assumption has influence on the estimation of the model parameters and the accuracy of spatial prediction in SGLM models. They used more flexible distributions such as the multivariate skew normal (SN) and the closed skew normal (CSN). Vicente et al. [12] defined nonlinear regression models with the SN errors and discussed classical and Bayesian approaches for these models. Karimi and Mohammadzadeh [13] discussed spatial regression models with closed skew normal correlated errors and showed that this modeling increases the accuracy of the parameter estimates. Alodat and Shakhatreh [14] modified the non-linear regression with the SN error terms and showed that it has better performance than the model with the normal error terms.
Azzalini and Dalla-Valle [15] introduced the SN distribution as an extension of the Normal distribution; it is contained in the new family of skew-normal distributions presented in [16]. The CSN distribution is an extension to the SN distribution and normal distribution, [17], [18], [19]. SN and CSN are more flexible because they include the normal distribution as a special case with an extra parameter to regulate skewness. Kim and Mallick [20] suggested a model based on SN distribution to handle skewed spatial data. Karimi et al. [21], Karimi and Mohammadzadeh [22] and Alodat and Al-Rawwash [23] defined a new closed skew Gaussian random field (CSG) and Rimstad and Omre [24] introduced an extension of this CSG random field. They defined a new parameterization of the CSN distribution. Tagle et al. [25] presented a new spatio-temporal model for non-Gaussian data by using skew-t distribution and showed that the skew-t model is more accurate than the Gaussian-model.
In this paper, we use the CSG random field proposed in Rimstad and Omre [24] and introduce a new SGLM model with skew spatial latent variables. We consider the marginal composite likelihood method introduced by Varin et al. [4] to estimate the model parameters. Their method has a high computational capability compared to the original likelihood model, especially for high dimensions. Also, we obtain an approximate CSN distribution for the conditional distribution of the latent variables given the spatial responses using a linearization method. Finally, the main contribution of the current work is to present a new approximate pairwise likelihood inference method for the resulting new proposed SGLM model.
This paper is organized as follows: In Section 2 the closed skew normal are defined. The CSG random field and SGLMM with closed skew normal latent variables are introduced in Section 3. The proposed approximate pairwise likelihood method, as well as the conditional distribution at unsampled locations for prediction is described in Section 4. Section 5 presents a simulation study to implement and evaluate the proposed model. Section 6 shows an application to Tehran Air Quality Index (AQI) data during March 2019 to March 2020. Closing remarks are given in Section 7.
Section snippets
Closed skew normal distribution
The CSN distribution is a class of statistical distributions, which includes the SN and normal distributions as special cases. In order to the CSN distribution extends the SN distribution by allowing more flexibility on the skewness directions. The CSN distribution has some very desirable properties, similar to those of the normal distribution. For instance, the CSN distribution is closed under marginalization, conditioning, and linear transformations (full column or row rank), see [26]. A
SGLMM with closed skew normal latent variables
The random field is called a Gaussian random field if any subset of points in the field, are jointly multivariate normal. Now, Let be the bivariate Gaussian random field and with finite and fixed q. Rimstad and Omre [24] defined the CSG random field as follows: if has a CSN distribution for all finite locations . They concluded that is approximately
Approximate pairwise likelihood inference
The maximum likelihood estimator has the merits such as consistency, asymptotic normality and asymptotic efficiency. However, a full maximum likelihood is not always computationally feasible when dimension is high. Pairwise likelihood approach which is the product of likelihoods for pairs of observations offers a significant reduction of the computational cost related to the conventional likelihood, because this approach is used a limited set of double integrals instead of high-dimensional
Simulation study
In this section, a simulation study is carried out to evaluate the performance of the approximate pairwise likelihood method based on realizations from the CSG random field. In addition, we compare results of the APEM-CSN algorithm presented in this paper with the APEM-Normal algorithm described in [34]. We simulate random locations inside an irregular grid of Romania map. An isotropic exponential correlation structure for with the spatial range parameter is used as Eq. (8). We fix
Application
Air pollution is one of the important problems in big cities. Poor health status and the number of unhealthy days is one of the important factors in choosing the right place to live for many people, especially those who have respiratory and lung diseases. In this section, the number of unhealthy days in different locations of Tehran is studied based on information obtained from air pollution monitoring stations. Tehran city is the capital of Iran with a total area of 18,814 square kilometers
Conclusion
In this work, we used the approximate pairwise likelihood method for SGLM models with the skew latent variables. For the latent variables, we used a general structure of the CSN distribution that includes most skew-normal distributions. This family of distributions, due to its closed properties under linearization, marginalization, and conditioning, provided suitable conditions for obtaining conditional prediction distributions of the latent variables on the response variable. However, in some
References (41)
- et al.
Pairwise likelihood inference in spatial generalized linear mixed models
Comput. Statist. Data Anal.
(2005) Spatial generalized linear mixed models with multivariate CAR models for areal data
Spat. Stat.
(2014)- et al.
Poisson cokriging as a generalized linear mixed model
Spat. Stat.
(2020) - et al.
BayesIan analysis of spatial generalized linear mixed models with Laplace moving average random fields
Comput. Statist. Data Anal.
(2020) - et al.
Approximate Bayesian inference in spatial GLMM with skew normal latent variables
Comput. Statist. Data Anal.
(2011) - et al.
Gaussian process regression with skewed errors
J. Comput. Appl. Math.
(2020) - et al.
The exact density of the sum of independent skew normal random variables
J. Comput. Appl. Math.
(2017) - et al.
A Bayesian prediction using the skew Gaussian distribution
J. Statist. Plann. Inference
(2004) - et al.
Skew-Gaussian random field
J. Comput. Appl. Math.
(2009) - et al.
Skew-Gaussian random fields
Spat. Stat.
(2014)