Skip to content
BY 4.0 license Open Access Published by De Gruyter January 18, 2021

Multiple scaled symmetric distributions in allometric studies

  • Antonio Punzo ORCID logo and Luca Bagnato ORCID logo EMAIL logo

Abstract

In allometric studies, the joint distribution of the log-transformed morphometric variables is typically symmetric and with heavy tails. Moreover, in the bivariate case, it is customary to explain the morphometric variation of these variables by fitting a convenient line, as for example the first principal component (PC). To account for all these peculiarities, we propose the use of multiple scaled symmetric (MSS) distributions. These distributions have the advantage to be directly defined in the PC space, the kind of symmetry involved is less restrictive than the commonly considered elliptical symmetry, the behavior of the tails can vary across PCs, and their first PC is less sensitive to outliers. In the family of MSS distributions, we also propose the multiple scaled shifted exponential normal distribution, equivalent of the multivariate shifted exponential normal distribution in the MSS framework. For the sake of parsimony, we also allow the parameter governing the leptokurtosis on each PC, in the considered MSS distributions, to be tied across PCs. From an inferential point of view, we describe an EM algorithm to estimate the parameters by maximum likelihood, we illustrate how to compute standard errors of the obtained estimates, and we give statistical tests and confidence intervals for the parameters. We use artificial and real allometric data to appreciate the advantages of the MSS distributions over well-known elliptically symmetric distributions and to compare the robustness of the line from our models with respect to the lines fitted by well-established robust and non-robust methods available in the literature.

1 Introduction

Allometry can be roughly devised as a tool to study the joint relation between parts (morphometric variables) in various organisms [1], [2], with the measurements being often log-transformed (for the practical and theoretical reasons why it is often useful to transform data to logarithms, see [3], [4], [5], [6]). So, the statistical models that are commonly considered are inherently multivariate (almost always bivariate), and mainly aim to:

  1. reproduce the joint distribution of the log-transformed morphometric variables;

  2. summarize the morphometric variation.

Concerning the first aim, the multivariate normal (MN) distribution with mean μ and covariance matrix Σ played, and still plays, a special rule. However, many empirical studies show that the MN is often not appropriate, and models have been introduced that relax the assumption of multivariate normality to an assumption that the resulting distribution is elliptically symmetric (or elliptically contoured or, simply, elliptical), still maintaining the MN model as a special case [7], [8]. This latter assumption is more biologically realistic, given that the distribution of allometric data is often heavy-tailed [9]. A well-known and simple model satisfying these requirements is the MN scale mixture (also known as MN variance mixture [10]); it is a finite/continuous mixture of MN distributions, with the same mean μ , where Σ is scaled by a convenient positive mixing random variable W whose probability mass/density function depends on a parameter vector θ governing the tails behavior (in terms of leptokurtosis) of the unconditional mixture.

As for the second aim, which is more debated in allometry [11], most multivariate studies have used principal component analysis (PCA); in this direction, in the important work by [12] it is suggested to summarize the morphometric variation by the first principal component (PC) of the log-transformed measurements. In the special bivariate case, the problem simplifies in fitting a line to the scatter plot of the log-transformed measurements and making inference about the fitted line, typically testing if the slope equals a given value and constructing confidence intervals for slope and elevation (intercept). Commonly used line-fitting methods in allometry are linear regression, major axis (which is equivalent to the first PC), and standardised major axis. These are least squares methods differing in the direction in which errors from the line are measured [6], [11]. Advantageously, however, major axis methods have a natural extension to more than two dimensions ([6], page 278).

The present paper has four merits. Firstly, concerning the first aim above, by using allometric data we show the existence of situations where, although the joint distribution of the log-transformed morphometric variables can be considered as symmetric and with heavy tails, the assumption of elliptical contours may be rather restrictive, with heavy-tailed distributions accounting for alternative symmetric shapes providing a better fit. To describe these shapes, we consider the family of multiple scaled symmetric (MSS) distributions proposed by Forbes and Wraith [13]. MSS distributions can be roughly considered as an extension of the MN scale mixture based on two key elements: the decomposition of the scale matrix Σ by eigenvalues and eigenvectors matrices Λ and Γ, and the use of the mixing random variable W separately for each dimension (PC) of the space spanned by the columns of Γ. The result is a multivariate heavy-tailed distribution in which the parameter vector θ is allowed to vary across PCs and where the contours are not constrained to be elliptical, although the symmetry is preserved. In allometric terms, these distributions have the advantage to be directly defined in the space of interest, namely the PC space, with each PC being distributed according to a (univariate) normal scale mixture. Moreover, regardless of the mixing distribution considered, and in analogy with the MN scale mixtures, MSS distributions enable (among other) natural/efficient computation of the maximum likelihood (ML) estimates via the expectation-maximization (EM) algorithm and robust ML estimation of μ and Λ based on down-weighting of mild outliers in the PC space. As a second merit of the paper, we propose a new member of the family of MSS distributions obtained by choosing a convenient shifted exponential as mixing distribution, so extending the recently proposed multivariate shifted exponential normal distribution [14] to the MSS framework. As a third merit, we introduce parsimony in the MSS distributions by constraining θ to be tied across PCs, resulting in distributions for the PCs having the same degree of leptokurtosis. In connection with the second aim discussed above, we know that line-fitting methods are quite sensitive to the presence of outliers, in the sense that the fitted line is directed towards them. As a limit case this implies that in certain situations, based on the position of the outlier(s), the resulting fitted line may be flipped into a completely different direction. As a final merit of the paper, via a simulation study and a sensitivity analysis based on allometric bivariate data, we show that the fitted line from MSS distributions, which is a major axis approach, is: 1) more robust to outliers when compared to the classical line-fitting methods discussed above, and 2) comparable with robust versions of major axis and standardised major axis approaches based on Huber’s M-estimator in place of least squares [15].

The paper is organized as follows. In Section 2 we first recall the MN scale mixtures (Section 2.1) and MSS distributions (Section 2.2), and then we introduce the multiple scaled shifted exponential normal distribution and the parsimonious variants of the MSS distributions (Section 2.3). In Section 3 we focus on the bivariate case by introducing what can be defined as major axis under MSS distributions. To simplify the reading of the paper, we postpone in Appendix A inferential aspects such as ML parameter estimation via the use of the EM algorithm, computation of standard errors for the estimates, formulation of confidence intervals and statistical tests. In Section 4 we describe the results of a simulation study aiming to compare, with and without outliers, the performance of the major axes from our MSS models with the lines fitted by well-established (robust) methods in allometry. In Section 5 we analyze two allometric data sets to demonstrate the better performance of the MSS models over heavy-tailed elliptical distributions which are well-known in the literature and to investigate the robustness of our MSS-based line-fitting methods. We conclude the paper with a brief discussion, as well as with avenues for further research, in Section 6.

2 Methodology

2.1 Multivariate normal scale mixture

For robustness sake, one of the most common ways to generalize the MN distribution is represented by the MN scale mixture (MNSM) with probability density function (pdf)

(1) f M N S M ( x ; μ , Σ , θ ) = 0 f M N ( x ; μ , Σ / w ) m ( w ; θ ) d w ,

where X is the d-variate vector of log-transformed morphometric variables, fMN (⋅; μ, Σ) denotes the pdf of the MN distribution with mean μ and covariance matrix Σ, in symbols N d ( μ , Σ ) , and m (w; θ ) is the mixing probability density (or mass) function of W, with a positive support, depending on the parameter vector θ , in symbols W M ( θ ) . If X has the pdf in (1), we write X N S M d ( μ , Σ , θ ) . For the MNSM model, Σ is a scale matrix.

The MNSM in (1) simply alters the tail behavior of the conditional (reference) MN distribution, allowing for leptokurtosis, while leaving the resultant distribution unimodal and elliptically symmetric (for further details about the properties of the MNSM family of distributions see, e.g., [16], [17], [18], [19], [20]). Moreover, as detailed in [21], the covariance matrix and kurtosis of X N S M d ( μ , Σ , θ ) are respectively.

(2) V a r ( X ) = v ( θ ) Σ
(3) K u r t ( X ) = k ( θ ) d ( d + 2 ) ,

where v ( θ ) = E (V) is the variance factor, k ( θ ) = E (V2)/[E (V)]2 is the kurtosis factor, and V = 1/W. As we can note from (3), the kurtosis is only governed by θ . Moreover, because E (V2) ≥ [E (V)]2, the excess kurtosis (with respect to the MN distribution) is non-negative, with the equality holding only in the degenerate case W ≡ 1. Therefore, apart from this limit case, the resulting distribution is leptokurtic. So, θ can be meant as the tailedness parameter of the MNSM model.

A more direct interpretation of X N S M d ( μ , Σ , θ ) can be given by its hierarchical representation as

W M ( θ ) X | W = w N d ( μ , Σ / w ) .

This alternative way to see the MNSM distribution is useful for random generation and for the implementation of the expectation-maximization (EM) algorithm.

Two members of the MNSM family of distributions, those will be mainly considered in this paper, are the multivariate t (Mt) [22], [23] and the multivariate shifted exponential normal (MSEN) [14]. The Mt is obtained when W has gamma distribution with shape and rate being both equal to θ/2, θ > 0, that is when W has pdf

(4) m ( w ; θ ) = ( θ 2 ) θ 2 Γ ( θ 2 ) w θ 2 1 exp ( θ 2 w ) , w > 0 .

If W has the pdf in (4), for notational convenience we write W G ( θ / 2 , θ / 2 ) . The MSEN is obtained when W has shifted exponential distribution, with rate θ > 0 and support (1, ∞); in symbols, W S E ( θ ) . The pdf of W S E ( θ ) is

m ( w ; θ ) = θ exp [ θ ( w 1 ) ] , w > 1 .

Further examples of distributions belonging to the MNSM family are: the multivariate contaminated normal (MCN) [24], the multivariate tail-inflated normal (MTIN) [21] and the multivariate symmetric generalized hyperbolic (MSGH). The MSGH distribution, in turn, nests many other well known MNSM distributions such as the multivariate symmetric hyperbolic (MSH), the multivariate symmetric variance-gamma (MSVG), and the multivariate symmetric normal inverse Gaussian (SNIG); for details about the MSGH distribution, along with its special cases, see [20], Example 3.8.

2.2 Multiple scaled symmetric distributions

Forbes and Wraith [13] propose an alternative way to alter the tail behavior of the conditional MN distribution in (1), while leaving the resultant distribution unimodal and symmetric. The idea roughly consists in:

  1. using the classical eigen-decomposition

    (5) Σ = Γ Λ Γ ,

    where Γ is a d × d matrix of eigenvectors of Σ and Λ is the corresponding d × d diagonal matrix of eigenvalues;

  2. generalizing the univariate mixing variable W in (1) to a d-variate mixing variable W = (W1, …, W d ) with independent variates and joint probability density (or mass) function

    m ( w ; θ ) = h = 1 d m ( w h ; θ h ) ,

    where θ = ( θ 1 , , θ d ) . For notational convenience we write W M d ( θ ) .

Note that, each element on the right-hand side of (5) has a different geometric interpretation: Λ determines the size and shape of the scatter, while Γ determines its orientation. The resulting multiple scaled symmetric (MSS) distributions have pdf

(6) f M S S ( x ; μ , Γ , Λ , θ ) = 0 0 f M N ( x ; μ , Γ Δ w Λ Γ ) m ( w 1 , , w d ; θ ) d w 1 d w d ,

where Δ w = d i a g ( w 1 1 , , w d 1 ) . If X has the distribution in (6) we write X M S S d ( μ , Γ , Λ , θ ) .

The MSS distributions have a convenient hierarchical representation too. In detail, X M S S d ( μ , Γ , Λ , θ ) can be represented as

W M d ( θ ) X | W = w N d ( μ , Γ Δ w Λ Γ ) .

For the sake of interpretation, the pdf in (6) can be equivalently written as

(7) f M S S ( x ; μ , Γ , Λ , θ ) = h = 1 d 0 f N ( [ Γ ( x μ ) ] h ; 0 , λ h / w h ) m ( w h ; θ h ) d w h          = h = 1 d f N S M ( [ Γ ( x μ ) ] h ; 0 , λ h , θ h ) ,

where Γ (x − μ) =  Y is the principal components (PCs) transform of x and can be thought of as a rotation and a recentering of x , [ Γ ( x μ ) ] h = Y h is the hth element of Y , and λ h is the hth diagonal element of Λ (or, equivalently, the hth eigenvalue of Σ). Note that, in Y , the variables Y1, …, Y d are ordered so that Var(Y1) ≥ Var(Y2) ≥ ⋯ ≥ Var(Y d ), in analogy with the classical PCA. In (7), the subscripts “N” and “NSM” are used instead of “MN” and “MNSM”, respectively, for the univariate case. It is straightforward to realize that Y h N S M 1 ( 0 , λ h , θ h ) . Finally note that, apart from the univariate case d = 1, the MSS distribution is not a special or limiting case of the MNSM distribution.

Concerning the estimation of the parameters μ , Γ, Λ, and θ , several estimators may be considered. Among them, the maximum likelihood (ML) estimator is the most attractive because of its asymptotic properties (see, e.g., [25], p. 206). Details about the expectation-maximization (EM) algorithm, to obtain ML estimates of the parameters for MSS distributions, are given in Appendix A.1.

2.3 Special cases of MSS distributions and parsimony

In the MSS family of distributions, Forbes and Wraith [13] introduce the multiple scaled t (MSt) distribution. It is obtained by setting W h G ( θ h / 2 , θ h / 2 ) . For the MSt, the variance and kurtosis factors for each PC are respectively given by

(8) v ( θ h ) = θ h θ h 2 a n d k ( θ h ) = θ h 2 θ h 4 ,

being in general Var (Y h ) = λ h v (θ h ) and Kurt (Y h ) = 3k (θ h ), h = 1, …, d; compare with Eqs. (2) and (3).

In this paper we introduce a new member of the MSS family, the multiple scaled SEN (MSSEN) distribution, obtained by assuming W h S E ( θ h ) . For the MSSEN, the variance and kurtosis factors for each PC are respectively given by

(9) v ( θ h ) = θ h exp ( θ h ) φ 1 ( θ h ) a n d k ( θ h ) = θ h v ( θ h ) 2 [ 1 v ( θ h ) ] ,

where

φ m ( z ) = E m ( z )       = 1 t m exp ( z t ) d t        = z ( m + 1 ) Γ ( m + 1 , z )

is the Misra function [26], further generalization of the generalized exponential integral function E n ( z ) = 1 t n exp ( z t ) d t [27] with Γ ( n , z ) = z t n 1 exp ( t ) d t denoting the upper incomplete gamma function; for further details, see [14]. The MSt and MSSEN are the two special cases of MSS distributions we will consider in this paper.

To evaluate the data configurations favoring the MSt over the MSSEN, and vice versa, it is enough to consider univariate t and SEN distributions. Their main difference relies on the joint behavior of the variance and kurtosis factors v(θ h ) and k(θ h ), h = 1, …, d; refer to Figure 1. In particular, λ h kept fixed, for the t distribution larger values of kurtosis are related to larger values of variance (cf. Figure 1(a)) while, for the SEN distribution, larger values of kurtosis are related to smaller values of variance (cf. Figure 1(b)); see [28].

Figure 1: 
Multiplicative factors v (θ
h
) (solid line) and k (θ
h
) (dashed line).
Figure 1:

Multiplicative factors v (θ h ) (solid line) and k (θ h ) (dashed line).

In models (6) and (7) there is a θ h -parameter for each PC. This means that, as the dimension d grows, the total number of free parameters to be estimated can quickly become large leading to overfitting. To introduce parsimony, we allow the d tailedness parameters to be tied across PCs, i.e. we assume θ 1 = ⋯ =  θ d . This results in normal scale mixtures, for the PCs, having the same degree of leptokurtosis. When this constraint is applied to MSt and MSSEN distributions, we will simply write C-MSt and C-MSSEN, respectively.

3 Line-fitting from bivariate MSS distributions

As emphasized in Section 1, allometry mainly focuses on the bivariate case (d = 2), where X = ( X 1 , X 2 ) . In this case, we would like to make inference about the elevation (intercept) α and the slope β of the line describing the relationship between X1 and X2. Operationally, we are often interested in testing if β equals a particular value, and estimating confidence intervals for α and β. Therefore, it is important: 1) to evaluate how the MSS distributions specialize in the bivariate case, and 2) parametrize their pdf in terms of α and β.

First of all, it is important to underline that the line determined under MSS distributions belongs to the class of the major axis (or first PC axis) methods; as such, the line we are interested in is the axis of Y1.

There are different ways to parametrize the MSS distributions, in the bivariate case, in terms of α and β. We decided to express Γ and μ2 (mean of X2) in terms of α and β in the following way

Γ = ( 1 1 + β 2 β 1 + β 2 β 1 + β 2 1 1 + β 2 ) a n d μ 2 = α + β μ 1 ,

where μ1 is the mean of X1. According to this choice, the pdf of the MSt distribution becomes

(10) f M S t ( x 1 , x 2 ; Ψ ) = Γ ( θ 1 + 1 2 ) Γ ( θ 2 + 1 2 ) π Γ ( θ 1 2 ) Γ ( θ 2 2 ) λ 1 λ 2 θ 1 θ 2 ( 1 + [ x 1 + β ( x 2 α ) μ 1 ( 1 + β 2 ) ] 2 λ 1 θ 1 ( 1 + β 2 ) ) θ 1 + 1 2 . ( 1 + ( x 2 β x 1 α ) 2 λ 2 θ 2 ( 1 + β 2 ) ) θ 2 + 1 2 ,

while the pdf of the MSSEN distribution becomes

(11) f M S S E N ( x 1 , x 2 ; Ψ ) = θ 1 θ 2 exp ( θ 1 + θ 2 ) 2 π λ 1 + λ 2 φ 1 2 ( [ x 1 + β ( x 2 α ) μ 1 ( 1 + β 2 ) ] 2 2 λ 1 ( 1 + β 2 ) + θ 1 ) . φ 1 2 ( ( x 2 β x 1 α ) 2 2 λ 2 ( 1 + β 2 ) + θ 2 ) ,

where Ψ = (α, β, μ1, λ1, λ2, θ1, θ2)′.

Figure 2 shows, via isodensities, some possible shapes of the bivariate C-MSt and C-MSSEN distributions by fixing α, β, and μ1 (it is equivalent to fix the mean vector μ and orientation matrix Γ), and by varying λ1, λ2, and the common θ so to produce PCs with given variances and kurtoses; refer to Section 2.3. In detail, in all the plots, μ1 = 0, α = 0, and β = 1; note that in this case μ2 = 0. Variances and kurtoses of the two PCs, along with the values of λ1, λ 2 , and θ, are given in Table 1. Figure 2 clearly shows how the shape of the MSS distributions is not constrained to be elliptical, although the symmetry is preserved. In particular, the choices made for θ (refer to Table 1) produce “smoothed” rhomboidal (Figure 2(a) and (b)) and starred (Figure 2(c) and (d)) contours.

Figure 2: 
Examples of contour plots of bivariate MSS distributions with common θ where μ1 = 0, α = 0, and β = 1. The values of λ
h
, h = 1, 2, and θ are given in Table 1.
Figure 2:

Examples of contour plots of bivariate MSS distributions with common θ where μ1 = 0, α = 0, and β = 1. The values of λ h , h = 1, 2, and θ are given in Table 1.

Table 1:

Variances and kurtoses of PCs, along with the values of λ1, λ2, and θ, for the contour plots of Figure 2.

(a) Figure 2(a)

Variance
λ
Kurtosis
θ
Y 1 3 2.4 4 10
Y 2 1 0.8 4 10
(b) Figure 2(b)

Variance
λ
Kurtosis
θ
Y 1 3 7.2790 4 0.3853
Y 2 1 2.4263 4 0.3853
(c) Figure 2(c)

Variance
λ
Kurtosis
θ
Y 1 2 1.0037 400 4.0151
Y 2 1 0.5019 400 4.0151
(d) Figure 2(d)

Variance
λ
Kurtosis
θ
Y 1 2 2322.1544 400 0.0001
Y 2 1 1151.3737 400 0.0001

4 Simulation study: comparison of line-fitting methods

In this section, we investigate the behaviour of the fitted lines from our four bivariate MSS models – MSSEN, MSt, C-MSSEN, and C-MSt – through a large-scale simulation study performed using R [29]. We further provide a comparison with the following classical line-fitting methods: ordinary least-squares (OLS) regression, OLS bisector, major axis (MA; also known as orthogonal regression), standardized major axis (SMA; also known as reduced major-axis), robust MA (R-MA), and robust SMA (R-SMA), the last two approaches being based on Huber’s M-estimator; see [6], [11], [15] for details. Emphasis is given to the evaluation of the impact of outliers on the fitted slope of the competing methods.

To generate the data we consider a bivariate normal N 2 ( μ , Σ ) . We fix μ = 0 and Σ = ΓΛΓ , where

Γ = ( 1 2 1 2 1 2 1 2 ) a n d Λ = ( 1 λ 0 0 λ ) ,

with λ∈(0, 1) governing the axis-ratio of the elliptical scatter: as λ approaches to 1, the ellipse becomes more and more similar to a circle while, on the other side, as λ approaches to 0, the ellipse degenerates on its major axis. With the notation of Section 3, this would be equivalent to say that we generate data from a bivariate normal distribution whose elevation, slope, and mean of X1 are α = 0, β = 1, and μ1 = 0, respectively. If we want outliers to be included in the sample, we generate them from a bivariate normal N 2 ( μ out , Σ out ) , where μ out = q ⋅ (1, − 1) and Σout = 0.01 ⋅ Σ, with q being a quantity governing the distance of the outliers from the bulk of the data (represented by μ  = 0).

We consider four experimental factors: the sample size (n ∈ {100, 200}), the proportion of simulated outliers (p ∈ {0, 0.05, 0.10}), λ ∈ {0.2, 0.3}, and q ∈ {8, 10}. In Figure 3 we show an example of generated dataset, with outliers, in the case n = 200, p = 0.05, λ = 0.3, and q = 8. The contamination scheme is not chosen randomly, but it is similar to a classical contamination scheme used to evaluate robustness for MA methods (see, e.g., [30], [31], [32], [33]). We generate 100 data sets for each combination of (n, λ) when p = 0, and for each combination of (n, p, λ, q) when p > 0; the difference is due to the fact that, when p = 0, there is no reason to consider the experimental factor q. This yields a total of 20 scenarios (22 = 4 in the case p = 0, and 24 = 16 otherwise) and 2000 generated data sets (100 × 22 = 400 in the case p = 0, and 100 × 24 = 1600 for p > 0). For each generated data set, we fit the 10 competing models/lines discussed above. The MA, SMA, R-MA and R-SMA methods are estimated using the smatr package for R [34].

Figure 3: 
Example of simulated data, with outliers, in the case n = 200, p = 0.05, λ = 0.3, and q = 8.
Figure 3:

Example of simulated data, with outliers, in the case n = 200, p = 0.05, λ = 0.3, and q = 8.

For comparison’s sake, for each scenario and each line-fitting method, we display the box-plots of the estimated slopes over the 100 replications. While in the 4 scenarios without outliers we expect all the methods working well and comparably, in the 16 scenarios with outliers we hope the estimated slope is not flipped into the opposite direction.

Figure 4 reports the obtained results for the scenarios without outliers. A horizontal dashed gray line is placed at β = 1 to have, as a benchmark, the true but unknown value of β in the data generating process (DGP). Note that, we use a larger than expected range for the y-axis simply to have the same range for all the box-plots of this simulation study (compare with Figures 4 and 5); this should make easier the comparison of results referred to different scenarios. Apart from the OLS regression, the competing methods perform comparably, with a variability of the estimates that decreases either as n grows (from 100 to 200) or as λ decreases (from 0.3 to 0.2). The different behavior of the OLS estimates is due to the fact that the DGP violates the zero conditional mean assumption of the OLS regression (see, e.g., [35], Chapter 2).

Figure 4: 
Box-plots, over 100 replications, of the estimated slopes in the scenarios without outliers (p = 0). Each plot refers to a different pair (n, λ).
Figure 4:

Box-plots, over 100 replications, of the estimated slopes in the scenarios without outliers (p = 0). Each plot refers to a different pair (n, λ).

Figure 5: 
Box-plots, over 100 replications, of the estimated slopes in the scenarios with q = 8. Each plot refers to a different triplet (n, λ, p).
Figure 5:

Box-plots, over 100 replications, of the estimated slopes in the scenarios with q = 8. Each plot refers to a different triplet (n, λ, p).

In Figures 5 and 6 we report the obtained results for the scenarios with q = 8 and q = 10, respectively. Regardless of the considered scenario, the estimated slopes from our MSS distributions, as those from R-MA and R-SMA, are robust to the presence of outliers, with a behavior very similar to the one analyzed in the scenarios without contamination (compare with Figure 4); similarity is also in terms of variability of the estimates which is decreasing when either n increases or λ decreases. The only difference between the estimated slopes from our MSS distributions, and those from R-MA and R-SMA, is a slight larger variability for the estimates from the latter methods, especially when λ grows. Interestingly enough, the proportion of outliers p does not seem to have an impact on the estimates behavior. Instead, the bisector, MA, and SMA methods of line-fitting are strongly affected by outliers, with a median estimated β (over the 100 replications) being always close to −1, the opposite of the true β. The OLS regression provides box-plots that are mainly located on values of β between 0 and −1. Although the OLS-estimates are also in this case very far from the true value β = 1, they are slightly better than those from bisector, MA, and SMA methods.

Figure 6: 
Box-plots, over 100 replications, of the estimated slopes in the scenarios with q = 10. Each plot refers to a different triplet (n, λ, p).
Figure 6:

Box-plots, over 100 replications, of the estimated slopes in the scenarios with q = 10. Each plot refers to a different triplet (n, λ, p).

5 Real data analyses

In this section we illustrate two applications of our models on real allometric data. The first one (Section 5.1) considers bivariate data; one of the aims is to evaluate how a single outlier affects the behavior of the competing line-fitting methods already considered in the simulation study of Section 4. The second analysis (Section 5.2) aims to fill one of the gaps in Section 4, showing an application of our models on data with more than two measurements.

According to the considerations of Section 1, both the analyses fill another gap with the simulation study of Section 4, i.e. the evaluation of the ability of our models to reproduce the joint distribution of the log-transformed morphometric variables. We also compare our models with the following multivariate elliptical (heavy-tailed) distributions: Mt, MTIN, MSEN, MLN, MCN, MSVG, MSH, SNIG, and MSGH given in Section 2.1. With MLN we denote the multivariate leptokurtic normal distribution introduced by [36].

In analogy with Section 4, the analysis is entirely conducted in R. Parameters are estimated based on the ML approach. We adopt the EM algorithm, illustrated in Appendix A.1, for MSS distributions. For them, we start the algorithm from the M-step, providing initial values for w ˙ i h and Γ ˙ (refer to Appendix A.1.2). In detail, we fix Γ ˙ as the eigenvector matrix of the sample covariance matrix, so that Γ ˙ does not depend on the considered distribution, while we differentiate w ˙ i h according to the considered model: we fix w ˙ i h = 1 for MSt and C-MSt, and w ˙ i h = 2 for MSSEN and C-MSSEN, i = 1, …, n and h = 1, …, d. The code for density evaluation, random number generation, and fitting for the considered MSS distributions is available at http://docenti.unict.it/punzo/Rcode.htm . As concerns the competing models, we use the WML.MLN() function, of the code available at http://docenti.unict.it/punzo/Rcode.htm , to fit the MLN, the mtin.ML.ECME() function of the mtin package[1] to fit the MTIN, the msen.ML.EM() function of the msen package1 to fit the MSEN, the CNmixt() function of the ContaminatedMixt package [37] to fit the MCN, and the ghyp package [38] to fit the remaining models.

Having the competing models a differing number of parameters, their goodness-of-fit comparison is made, as usual, via the Akaike information criterion (AIC) [39] and the Bayesian information criterion (BIC) [40] that, in our formulation, need to be maximized. Moreover, we use likelihood-ratio (LR) tests not only to make inference on the parameters (refer to Appendix A.3) but also to: 1) compare the MN distribution (null model) with each of the competing distributions (alternative models), and 2) for testing MSS-parsimony (refer to the last part of Section 2.3), that is to compare the constrained (null) and unconstrained (alternative) MSS models based on the same mixing random variable. In the sequel, the obtained p-value will be always compared to the classical 5% significance level.

5.1 Anthropometric measurements of male twins

The first analysis considers the m.twins data set accompanying the Flury package [41] for R. The data set contains six anthropometric measurements for n = 89 pairs of monozygotic/dizygotic male twins measured in the 1950s at the University of Hamburg, Germany. For details on these data, see [42]. We focus on the logarithm of d = 2 of the available measurements: the chest circumference of first and second twin. The variables are expressed in centimeters. The scatter plot of the log-transformed data is displayed in Figure 7.

Figure 7: 
Scatter plot of the of the log-transformed m.twins data.
Figure 7:

Scatter plot of the of the log-transformed m.twins data.

The first aim of this analysis is to evaluate the best model for the bivariate distribution of the data. The scatter in Figure 7 seems to be symmetric enough and this is corroborated by the Mardia test of symmetry (p-value = 0.444), as implemented by the mvn() function of the MVN package [43]. Table 2 presents the model comparison. AIC and BIC provide the same ranking for the best four models. They are all MSS distributions, with the MSSEN being the overall best, followed by the other unconstrained model (MSt). The top-four models have the same ranking induced by the p-values of the LR test of bivariate normality. In particular, the null hypothesis of bivariate normality is rejected for the unconstrained MSS models only (0.003 for the MSSEN and 0.004 for the MSt). The better performance of the unconstrained MSS models is also corroborated by the p-values from the LR test of MSS-parsimony given in the last column of Table 2: the MSSEN should be preferred to the C-MSSEN (p-value = 0.003), and the MSt should be preferred to the C-MSt (p-value = 0.007). Figure 8 shows the scatter plot of the log-transformed data where isodensities of the selected MSSEN model are superimposed. Summarizing, the results in Table 2, as well as the plot in Figure 8, underline the need for types of symmetry other than elliptical.

Table 2:

Model comparison, for the m.twins data, in terms of number of parameters (#par), log-likelihood, AIC, BIC, and p-values from the LR tests of multivariate normality and MSS-parsimony. Rankings induced by the considered criteria are also displayed.

Model # par. Log-Lik. AIC Ranking BIC Ranking Normality LR p-value Ranking MSS-parsimony LR p-value
MN 5 213.822 417.644 12 405.201 5
Mt 6 215.653 419.306 6 404.374 6 0.176 5
MTIN 6 215.594 419.188 7 404.256 7 0.183 6
MSEN 6 214.711 417.421 13 402.489 12 0.346 12
MLN 6 214.978 417.957 11 403.025 11 0.282 11
MCN 7 216.943 419.885 5 402.465 13 0.210 7
MSVG 6 215.093 418.186 10 403.254 10 0.260 10
MSH 6 215.264 418.529 9 403.597 9 0.230 9
SNIG 6 215.360 418.721 8 403.789 8 0.215 8
MSGH 7 215.653 417.306 14 399.886 14 0.400 13
MSSEN 7 225.347 436.693 1 419.273 1 0.003 1 0.003
MSt 7 224.689 435.377 2 417.957 2 0.004 2 0.007
C-MSSEN 6 216.552 421.103 4 406.172 4 0.099 4
C-MSt 6 217.492 422.983 3 408.051 3 0.055 3
Figure 8: 
Scatter plot of the log-transformed m.twins data with superimposed isodensities from the ML-fitted MSSEN model.
Figure 8:

Scatter plot of the log-transformed m.twins data with superimposed isodensities from the ML-fitted MSSEN model.

The second aim of the analysis is to compare the behavior of the line-fitting methods considered in Section 4. Table 3 reports the estimated elevation α ˆ and slope β ˆ , and approximate 95% confidence intervals (CIs). For the bisector method and for all our models, these intervals are based on the asymptotic normality of the estimators and are computed as α ˆ 1.96 se ˆ ( α ˆ ) and β ˆ 1.96 se ˆ ( β ˆ ) ; see Appendix A.3 for details. Standard errors are computed as described in [11] for the bisector method, and using the score based-approach illustrated in Appendix A.2 for our models. For the remaining methods, approximate 95% CIs are those provided by the smatr package (see [15] for details). Moreover, because of interest in allometry, we also evaluate whether the chest circumferences of the first and second twin are directly proportional. If this happens, then the log-transformed variables will be linearly related with a slope of 1. Hence, we simply need to test the null hypothesis H0 : β = 1. We do that by checking if the value of the slope under the null is included in the approximate 95% CI. For our models only, we also show the p-values from the Wald and LR tests outlined in Appendix A.3.

Table 3:

Estimates and 95% confidence intervals (CIs) for the parameters α and β of the competing lines fitted to the m.twins data in Figure 7. The last two columns report the p-values of the Wald and LR tests for H0:β = 1.

Methods 95% CI 95% CI H0:β = 1 (p-value)
α ˆ 2.5% 97.5% β ˆ 2.5% 97.5% Wald LR
OLS 0.2583 −0.0494 0.5659 0.9398 0.8683 1.0113
MA 0.0089 −0.3177 0.3356 0.9978 0.9245 1.0768
SMA 0.0084 −0.2993 0.3161 0.9979 0.9289 1.0720
R-MA 0.0464 −0.2069 0.2996 0.9893 0.9321 1.0500
R-SMA 0.0449 −0.2003 0.2901 0.9896 0.9343 1.0483
Bisector 0.0084 −0.2468 0.2635 0.9979 0.9385 1.0573
MSSEN 0.0820 −0.1519 0.3159 0.9810 0.9263 1.0356 0.4942 0.2974
MSt 0.0733 −0.2043 0.3509 0.9831 0.9184 1.0477 0.6074 1.0000
C-MSSEN 0.0601 −0.2471 0.3673 0.9861 0.9147 1.0575 0.7019 0.6295
C-MSt 0.0706 −0.2275 0.3687 0.9837 0.9144 1.0529 0.6440 0.5766

As we can note by the obtained results, OLS regression works differently from the other competing methods, providing a larger α ˆ and, in line with the simulation results of Section 4, a smaller β ˆ . Moreover, 95% CIs for elevation and slope do not contain the values α = 0 and β = 1; these values would represent a direct proportionality between the chest circumferences of the first and second twin (β = 1) with unitary proportionality constant (α = 0). For the remaining methods we have the following considerations. They work comparably providing an estimated slope close to 1 and an estimated elevation close to 0. The 95% CIs for elevation and slope contain the reference values α = 0 and β = 1, respectively. As regards our models, the p-values from the Wald and LR tests are always greater than 0.05.

The final aim of the analysis is to evaluate how a single outlier affects the behavior of the competing line-fitting methods. We create a “perturbed” version of the m.twins data by adding a single outlier at ( x 1 1.6 , x 2 + 1.6 ) , being x 1 and x 2 the sample means of the log-transformed chest circumferences of the first and second twin, respectively. The scatter plot of the perturbed log-transformed data is displayed in Figure 9.

Figure 9: 
Scatter plot of the of the log-transformed m.twins data with the inclusion of an outlier displayed as •.
Figure 9:

Scatter plot of the of the log-transformed m.twins data with the inclusion of an outlier displayed as •.

Table 4 reports the summary results about the estimated elevation and slope for each competing method, with corresponding lines displayed in Figure 10 for OLS, bisector, MA, and SMA, and in Figure 11 for the remaining approaches.

Table 4:

Estimates and 95% confidence intervals (CIs) for the parameters α and β of the competing lines fitted to the m.twins data with a single outlier (Figure 9). The last two columns report the p-values of the Wald and LR tests for H0:β = 1.

Methods 95% CI 95% CI H0:β = 1 (p-value)
α ˆ 2.5% 97.5% β ˆ 2.5% 97.5% Wald LR
OLS 5.6369 4.7729 6.5009 −0.3075 −0.5089 −0.1061
MA 8.5935 5.7935 11.3935 −0.9976 −2.1832 −0.4551
SMA 8.6006 7.7361 9.4651 −0.9993 −1.2208 −0.8179
R-MA 0.0462 −0.2177 0.3102 0.9895 0.9299 1.0528
R-SMA 0.0447 −0.2102 0.2997 0.9899 0.9324 1.0509
Bisector 8.6020 8.5509 8.6532 −0.9996 −1.0164 −0.9827
MSSEN 0.0824 −0.1499 0.3146 0.9809 0.9266 1.0351 0.4895 0.4962
MSt 0.0707 −0.2248 0.3662 0.9837 0.9148 1.0525 0.6418 1.0000
C-MSSEN 0.0826 −0.1427 0.3079 0.9808 0.9283 1.0334 0.4744 0.5505
C-MSt 0.0711 −0.2157 0.3579 0.9836 0.9168 1.0503 0.6293 1.0000
Figure 10: 
Scatter plot of the of the log-transformed perturbed m.twins data and fitted lines from non-robust competing methods.
Figure 10:

Scatter plot of the of the log-transformed perturbed m.twins data and fitted lines from non-robust competing methods.

Figure 11: 
Scatter plot of the of the log-transformed perturbed m.twins data and fitted lines from robust competing methods.
Figure 11:

Scatter plot of the of the log-transformed perturbed m.twins data and fitted lines from robust competing methods.

From these plots we realize that, while the lines from R-MA, R-SMA, and our MSS models are not affected at all by the outlier, the lines from the other methods are severely dragged towards that point, and this is especially true for the bisector, MA and SMA methods where the outlier lies on the fitted line. We can obtain similar insights by comparing Table 3 with Table 4.

5.2 Anthropometric measurements of female twins

The second analysis considers the f.twins data set always included in the Flury package. The data set contains d = 6 anthropometric measurements for n = 79 pairs of monozygotic/dizygotic female twins measured in the 1950s at the University of Hamburg, Germany. The measurements, expressed in centimeters, for each pair of twins are: stature (STA1 and STA2), hip width (HIP1 and HIP2), and chest circumference (CHE1 and CHE2) for each of the two sisters. For details on these data, see [42].

The matrix of pairwise scatter plots, for the log-transformed data, is displayed in Figure 12. These data can be considered as symmetric according to the Mardia test of symmetry (p-value = 0.205).

Figure 12: 
Matrix of scatter plots of the log-transformed f.twins data.
Figure 12:

Matrix of scatter plots of the log-transformed f.twins data.

Table 5 presents the model comparison. The best four models for the AIC are the MSS ones, with the MSSEN and C-MSSEN occupying the first and second position, respectively. As concerns the BIC, the best two models are the constrained MSS distributions (the C-MSSEN is the overall best), followed by the MSSEN. This podium is confirmed by LR test of multivariate normality, with the models on the podium being the only ones leading to the rejection of the null hypothesis. The best performance of the constrained MSS models, suggested by the BIC, is corroborated by the p-values from the LR test of MSS-parsimony: the C-MSSEN should be preferred to the MSSEN (p-value = 0.206), and the C-MSt should be preferred to the MSt (p-value = 0.640).

Table 5:

Model comparison, for the f.twins data, in terms of number of parameters (#par), log-likelihood, AIC, BIC, and p-values from the LR tests of multivariate normality and MSS-parsimony. Rankings induced by the considered criteria are also displayed.

Model # par. Log-Lik. AIC Ranking BIC Ranking Normality LR p-value Ranking MSS-parsimony LR p-value
MN 27 763.103 1472.207 7 1408.232 4
Mt 28 763.675 1471.350 12 1405.006 11 0.450 11
MTIN 28 763.690 1471.380 11 1405.035 10 0.444 10
MSEN 28 764.802 1473.605 5 1407.260 5 0.192 5
MLN 28 763.845 1471.690 10 1405.345 9 0.389 9
MCN 29 764.286 1470.573 13 1401.859 13 0.554 12
MSVG 28 764.271 1472.543 6 1406.198 6 0.280 6
MSH 28 764.059 1472.119 8 1405.774 7 0.328 7
SNIG 28 763.900 1471.800 9 1405.455 8 0.372 8
MSGH 29 764.271 1470.543 14 1401.829 14 0.558 13
MSSEN 33 778.249 1490.499 1 1412.307 3 0.019 3 0.206
MSt 33 773.235 1480.470 4 1402.278 12 0.119 4 0.640
C-MSSEN 28 771.059 1486.118 2 1419.773 1 0.005 1
C-MSt 28 769.845 1483.691 3 1417.346 2 0.009 2

6 Conclusions and discussion

In allometric studies, the joint distribution of log-transformed morphometric variables is often assumed to be elliptical. Moreover, with reference to the bivariate case, we know the importance of using a good line-fitting method, preferably robust in the presence of outliers. Concerning the first point, by considering real allometric data, and by using for the first time multiple scaled symmetric (MSS) distributions in this context, we showed the existence of cases where the symmetry involved by the available data is different from the elliptical one. As further merits of the paper, we introduced a new member of this class of models, the multiple scaled shifted exponential normal distribution, and proposed a simple way to add parsimony for MSS distributions. Concerning the second issue, by using artificial and real data we showed that the fitted line from MSS distributions is: 1) more robust to outliers when compared to classical line-fitting methods considered in allometry, and 2) comparable with robust versions of major axis and standardized major axis approaches based on Huber’s M-estimator.

As an open point for further research, it could be interesting to consider MSS distributions in common principal component analysis (CPCA), generalization of standard PCA to several groups under the rigid mathematical assumption of equality of all principal components across groups. From an allometric point of view, this will allow us to extend our inferential results to the case where the user is interested in comparing several slopes or elevations, and testing for shift along the axis amongst several groups (refer to [6]).


Corresponding author: Luca Bagnato, Dipartimento di Scienze Economiche e Sociali, Università Cattolica del Sacro Cuore, Piacenza, Italy, E-mail:

  1. Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

  2. Research funding: None declared.

  3. Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

Appendix

A Inference by maximum likelihood

A.1 Application of the EM algorithm

Given a random sample x 1, …, x n (observed data) from X M S S d ( μ , Γ , Λ , θ ) , the ML estimation method is based on the maximization of the (observed-data) log-likelihood function

l ( Ψ ) = i = 1 n log f M S S ( x i ; μ , Γ , Λ , θ ) ,

where Ψ contains all the parameters of the model. However, the problem of maximizing l (Ψ) over Ψ is not particularly easy due to the number of parameters and the necessity of maximizing over an orthogonal matrix Γ.

The ML fitting is simplified considerably by the application of the expectation-maximization (EM) algorithm [44] which is, indeed, the classical approach to find ML estimates for parameters of distributions which are defined as a mixture. For the application of the EM algorithm, it is convenient to view the observed data as incomplete. The complete data are ( x 1 , w 1 ) , , ( x n , w n ) , where the missing variables w 1, …, w n are defined so that

X i | W i = w i N d ( μ , Γ Δ w i Λ Γ ) ,

independently for i = 1, …, n, and

W i i . i . d . M d ( θ ) .

Because of this conditional structure, the complete-data likelihood function L c (Ψ) can be factored into the product of the conditional densities of X i given the w i and the joint marginal densities of W i , i.e.

L c ( Ψ ) = i = 1 n f M N ( x i ; μ , Γ Δ w i Λ Γ ) m ( w i ; θ ) .

Accordingly, the complete-data log-likelihood function can be written as

(12) l c ( Ψ ) = l 1 c ( μ , Γ , Λ ) + l 2 c ( θ ) ,

where

(13) l 1 c ( μ , Γ , Λ ) = n d 2 log ( 2 π ) + 1 2 i = 1 n h = 1 d log ( w i h ) n 2 h = 1 d log ( λ h ) 1 2 i = 1 n ( x i μ ) ( Γ Δ w i Λ Γ ) 1 ( x i μ ) ,

and

(14) l 2 c ( θ ) = h = 1 d i = 1 n log [ m ( w i h ; θ h ) ] .

So, while l1c(μ, Γ, Λ) is shared by all the MSS distributions, l2c( θ ) is distribution-dependent and, to be more precise, needs to take into account the parsimony constraint when applied. In detail, the l2c( θ ) term for the MSt is

(15) l 2 c ( θ ) = 1 2 h = 1 d θ h i = 1 n log ( w i h ) h = 1 d i = 1 n log ( w i h ) 1 2 h = 1 d θ h i = 1 n w i h + n 2 h = 1 d θ h log ( θ h 2 ) n h = 1 d log [ Γ ( θ h 2 ) ] ,

while for the MSSEN is

(16) l 2 c ( θ ) = h = 1 d θ h i = 1 n w i h + n h = 1 d θ h + n h = 1 d log ( θ h ) .

Working on l c (Ψ), the EM algorithm iterates between two steps, one E-step and one M-step, until convergence. These steps, for a generic iteration of the algorithm, as well as for each considered MSS distribution, are detailed below. As in [45], [46], quantities/parameters marked with one dot will correspond to the previous iteration and those marked with two dots will represent the estimates at the current iteration.

A.1.1 E-step

The E-step requires the calculation of

(17) Q ( Ψ | Ψ ˙ ) = Q 1 ( μ , Γ , Λ | Ψ ˙ ) + Q 2 ( θ | Ψ ˙ ) ,

the conditional expectation of l c (Ψ) given the observed data x 1, …, x n , using Ψ ˙ for Ψ. In (17) the two terms on the right-hand side are ordered as the two terms on the right-hand side of (12). As well-explained in [20], p. 82, in order to compute Q ( Ψ | Ψ ˙ ) we need to replace any function g (W ih ) of the latent mixing variables which arise in (13) and (14) by the quantities E Ψ ˙ [ g ( W i h ) | X i = x i ] , where the expectation, as it can be noted by the subscript, is taken using Ψ ˙ for Ψ, i = 1, …, n. To calculate these expectations we can observe that the conditional pdf of W ih |X i  =  x i satisfies f (w ih | x i ; Ψ) ∝ f (w ih , x i ; Ψ), up to some constant of proportionality. For the MSt distribution we have

(18) f ( w i h | x i ; Ψ ) f ( w i h , x i ; Ψ )       w i θ h + 1 2 1 exp ( w i h { θ h 2 + 1 2 λ h 1 [ Γ ( x i μ ) ] h 2 } )       f G ( w i h ; θ h + 1 2 , θ h 2 + 1 2 λ h 1 [ Γ ( x i μ ) ] h 2 ) .

This means that W i h | X i = x i G ( θ h + 1 2 , θ h 2 + 1 2 λ h 1 [ Γ ( x i μ ) ] h 2 ) . For the MSSEN distribution we have

(19) f ( w i h | x i ; Ψ ) f ( w i h , x i ; Ψ )       w i 1 2 exp ( w i h { θ h + 1 2 λ h 1 [ Γ ( x i μ ) ] h 2 } )       Γ ( 3 2 ) Γ ( 3 2 , 1 2 λ h 1 [ Γ ( x i μ ) ] h 2 + θ h ) f G ( w i h ; 3 2 , θ h + 1 2 λ h 1 [ Γ ( x i μ ) ] h 2 ) .

This means that W ih |X i  = x i has a left-truncated gamma distribution (see, e.g., [47], [48], [49]), on the interval (1, ∞), with shape 3/2 and rate θ h + 1 2 λ h 1 [ Γ ( x i μ ) ] h 2 , whose pdf is given in the last line of (19); in symbols

(20) W i h | X i = x i L T G ( 1 , ) ( 3 2 , θ h + 1 2 λ h 1 [ Γ ( x i μ ) ] h 2 ) .

For the MSt model, the functions g (W ih ) arising in (13) and (15) are g1 (w) = w and g2 (w) = log (w). Thanks to (18) we obtain

(21) E Ψ ˙ ( W i h | X i = x i ) = θ ˙ h + 1 θ ˙ h + λ ˙ h 1 [ Γ ˙ ( x i μ ˙ ) ] h 2         = : w ˙ i h

and

(22) E Ψ ˙ [ log ( W i h ) | X i = x i ] = ψ ( θ ˙ h + 1 2 ) log ( θ ˙ h 2 + 1 2 λ ˙ h 1 [ Γ ˙ ( x i μ ˙ ) ] h 2 )           = log ( w ˙ i h ) + [ ψ ( θ ˙ h + 1 2 ) log ( θ ˙ h + 1 2 ) ] ,

where ψ (z) denotes the digamma function. The term in squared brackets on the right-hand side of (22) can be interpreted as the correction for just imputing the conditional mean value w ˙ i h for w ih in log (w ih ).

For the MSSEN model, the function involved in (13) and (16) is only g1(w) = w. The expectation E Ψ ˙ ( W i h | X i = x i ) is, according to (20) and to [49], a ratio between two incomplete gamma functions, or equivalently

(23) w ˙ i h = φ 3 2 ( θ ˙ h + 1 2 λ ˙ h 1 [ Γ ˙ ( x i μ ˙ ) ] h 2 ) φ 1 2 ( θ ˙ h + 1 2 λ ˙ h 1 [ Γ ˙ ( x i μ ˙ ) ] h 2 ) .

So, by substituting w ih with w ˙ i h in (13), we obtain

(24) Q 1 ( μ , Σ | Ψ ˙ ) = n 2 h = 1 d log ( λ h ) 1 2 i = 1 n ( x i μ ) Γ Λ 1 Δ w ˙ i 1 Γ ( x i μ ) ,

where Δ w ˙ = diag ( w ˙ 1 1 , , w ˙ d 1 ) . In (24), the terms constant with respect to μ and Σ are dropped.

Regardless of the considered mixing variable, Q 2 ( θ | Ψ ˙ ) can be written as

Q 2 ( θ | Ψ ˙ ) = h = 1 d Q 2 h ( θ h | Ψ ˙ ) ,

where

(25) Q 2 h ( θ h | Ψ ˙ ) = n log [ Γ ( θ h 2 ) ] + n 2 θ h log ( θ h 2 ) + n 2 θ h { 1 n i = 1 n [ log ( w ˙ i h ) w ˙ i h ]       + ψ ( θ ˙ h + 1 2 ) log ( θ ˙ h + 1 2 ) }

for the MSt, and

(26) Q 2 h ( θ h | Ψ ˙ ) = θ h i = 1 n ( 1 w ˙ i h ) + n log ( θ h )

for the MSSEN. In (25), the terms constant with respect to θ h are dropped.

A.1.2 M-step

The M-step, at the same iteration, requires the calculation of Ψ ¨ as the value of Ψ that maximizes Q ( Ψ | Ψ ˙ ) . According to the right-hand side of (17), Q 1 ( μ , Γ , Λ | Ψ ˙ ) and Q 2 ( θ | Ψ ˙ ) can be maximized separately with respect to the parameters they involve, with the maximization of Q 1 ( μ , Γ , Λ | Ψ ˙ ) being independent of the particular MSS distribution considered.

Q 1 ( μ , Γ , Λ | Ψ ˙ ) is a weighted log-likelihood, with weights w ˙ 1 , , w ˙ n , of n independent observations x 1, …, x n from N d ( μ , Σ / w ˙ 1 ) , , N d ( μ , Σ / w ˙ n ) , respectively. So, it is possible to verify that the closed-form updates for μ and Λ are

(27) μ ¨ = Γ ˙ i = 1 n Δ ˜ w ˙ i Γ ˙ x i

and

(28) Λ ¨ = 1 n d i a g [ i = 1 n Δ w ˙ i 1 2 Γ ˙ ( x i μ ¨ ) ( x i μ ¨ ) Γ ˙ Δ w ˙ i 1 2 ] ,

where

Δ ˜ w ˙ i = d i a g { w ˙ i 1 j = 1 n w ˙ j 1 , , w ˙ i d j = 1 n w ˙ j d } .

From the form of the ML updates in (27) and (28), it is possible to note that μ ¨ and Λ ¨ are weighted estimates in the space spanned by Γ ˙ . In that space, the hth dimension of the ith transformed unit Γ ˙ x i , i.e. [ Γ ˙ x i ] h , contributes to the estimates of μ and Λ with weight w ˙ i h , i = 1, …, n and h = 1, …, d. For MSt and MSSEN distributions, as we can note in (21) and (23), respectively, w ˙ i h depends on the squared distance δ ˙ i h = λ ˙ h 1 [ Γ ˙ ( x i μ ˙ ) ] h 2 . Interestingly, it is possible to show that w ˙ i h decreases as δ ˙ i h increases (see [22] for the t and [14] for the SEN). As a consequence, the influence of [ Γ ˙ x i ] h values associated to large distances δ ˙ i h is reduced (down-weighted). Therefore, MSS distributions provide robust directional estimates of μ and Λ in the presence of mild outliers. We use the term “directional” because the down-weighting acts separately for each dimension spanned by Γ ˙ . The update of Γ is obtained as

Γ ¨ = arg min Γ i = 1 n t r a c e [ Γ ( Δ w ˙ i Λ ¨ ) 1 Γ ( x i μ ¨ ) ( x i μ ¨ ) ] .

We consider the PLR decomposition for orthogonal matrices, proposed in [8], in order to make the minimization unconstrained. Operationally, the minimization is done via the optim() function for R. The BFGS algorithm, passed to optim() via the argument method, is used for minimization.

The maximization of Q 2 ( θ | Ψ ˙ ) depends on the considered MSS distribution. For the MSt, a closed-form update for θ h does not exist. As for the Γ update, we perform the numerical maximization of Q 2 h ( θ h | Ψ ˙ ) in (25), with respect to θ h  > 1, by using the optim() function. In the constrained case θ1 = ⋯ = θ d  = θ, we obtain the update for the common θ by maximizing

Q 2 ( θ | Ψ ˙ ) = n d log [ Γ ( θ 2 ) ] + n d 2 θ log ( θ 2 ) + n d 2 θ { 1 n h = 1 d i = 1 n [ log ( w ˙ i h ) w ˙ i h ] +       + d ψ ( θ ˙ + 1 2 ) d log ( θ ˙ + 1 2 ) }

over θ. We always use the optim() function with this aim.

For the MSSEN, Q 2 h ( θ h | Ψ ˙ ) in (26) is the log-likelihood function of n independent observations w ˙ 1 h , , w ˙ n h from S E ( θ h ) , h = 1, …, d. Therefore, from the standard theory about the exponential distribution (see, e.g., [50]; Chapter 19), the update for θ h , h = 1, …, d, is

(29) θ ¨ h = n i = 1 n ( w ˙ i h 1 ) .

In the constrained case, the update for the common θ is

(30) θ ¨ = n d h = 1 d i = 1 n ( w ˙ i h 1 ) .

Advantageously, in (29) and (30), the updates for θ h and θ have a closed-form.

A.2 Standard errors

Let Ψ ˆ be the ML estimator of Ψ computationally obtained, in our case, via the EM algorithm illustrated in Appendix A. The need now arises to estimate the precision of the ML estimates. This is typically accomplished by calculating Cov ˆ ( Ψ ˆ ) , the estimated covariance matrix of Ψ ˆ ; the square root of the diagonal elements of Cov ˆ ( Ψ ˆ ) are then reported as standard errors of the ML-estimates [51]. However, the computation of Cov ˆ ( Ψ ˆ ) is a difficult and tedious task because obtaining the derivatives of the log-likelihood is not analytically trivial. To overcome the problem, current alternative methods are: numeric (using either the complete-data or the observed-data log-likelihoods) and bootstrap-based (using the parametric, nonparametric and weighted bootstrap approaches); see [52] for details. We follow the former numeric approach directly applied to the observed-data log-likelihood.

Before to present the method we use, some notation is needed. Let

s ( Ψ ) = i = 1 n s i ( Ψ )

be the score vector, where

s i ( Ψ ) = log f M S S ( x i ; Ψ ) Ψ

is the contribution of the ith observation to the score vector.

In ML theory, C o v ˆ ( Ψ ˆ ) is usually obtained from the information matrix ℐ (Ψ). If the model is correctly specified, then ℐ (Ψ) can defined by

I ( Ψ ) = E [ s ( Ψ ) s ( Ψ ) ] .

In our case we cannot obtain these expectations analytically. A simple and classical way to estimate ℐ (Ψ), based on the score vector (see, e.g., [52], [53]), is given by

I ( Ψ ˆ ) = i = 1 n s i ( Ψ ˆ ) s i ( Ψ ˆ ) .

We calculate a numerical approximation of the score vector, evaluated at Ψ ˆ , by using the grad() function included in the numDeriv package [54] for R.

A.3 Confidence intervals and tests

Once standard errors are computed, formulating confidence intervals and test statistics is not a difficult task for ML estimators.

Let Ψ be a generic parameter in Ψ and let Ψ ˆ , element of Ψ ˆ , be the ML estimator of Ψ. If we denote the estimated standard error of Ψ ˆ as se ˆ ( Ψ ˆ ) , then its main use is to test H0:Ψ = Ψ0 by the Wald statistic

(31) Z = Ψ ˆ Ψ s e ˆ ( Ψ ˆ ) ,

or to compute Wald confidence intervals. For example, the Wald 95% confidence interval for Ψ is

Ψ ˆ 1.96 s e ˆ ( Ψ ˆ ) .

Under H0 the asymptotic distribution of Z in (31) can be approximated by a standard normal.

An alternative way to test H0:Ψ = Ψ0 is by using a likelihood ratio test where the null model is fitted by fixing Ψ = Ψ0, while the alternative model is fitted without this constraint. The test statistic is

L R = 2 [ l ( Ψ ˆ 0 ) l ( Ψ ˆ 1 ) ]

where Ψ ˆ 0 and Ψ ˆ 1 are the ML estimates of Ψ under the null and alternative models, respectively. Using Wilks’ theorem, under H0 LR can be approximated by a χ2 random variable with one degree of freedom (given by the difference in the number of estimated parameters between the alternative and the null models), and this allows us to compute a p-value.

References

1. Huxley, J. Problems of relative growth. London, UK: Methuen; 1993.10.56021/9780801846595Search in Google Scholar

2. Klingenberg, CP. Multivariate allometry. In: Marcus, LE, Corti, M, Loy, A, Naylor, GJP, Slice, DE, editors. Advances in morphometrics. Boston, MA: Springer; 1996:23–49 pp.10.1007/978-1-4757-9083-2_3Search in Google Scholar

3. Pimentel, RA. Morphometrics, the multivariate analysis of biological data. Dubuque, IA: Kendall/Hunt Pub. Co; 1979.Search in Google Scholar

4. Reyment, RA. Multidimensional palaeobiology. Oxford: Pergamon Press; 1991.Search in Google Scholar

5. Bookstein, FL. Morphometric tools for landmark data: geometry and biology. Geometry and biology. Cambridge: Cambridge University Press; 1997.Search in Google Scholar

6. Warton, DI, Wright, IJ, Falster, DS, Westoby, M. Bivariate line-fitting methods for allometry. Biol Rev 2006;81:259–91. https://doi.org/10.1017/s1464793106007007.Search in Google Scholar PubMed

7. Taskinen, S, Warton, DI. Robust tests for one or more allometric lines. J Theor Biol 2013;333:38–46. https://doi.org/10.1016/j.jtbi.2013.05.010.Search in Google Scholar PubMed

8. Bagnato, L, Punzo, A. Unconstrained representation of orthogonal matrices with application to common principal components. Comput Stat 2020 Oct 27. https://doi.org/10.1007/s00180-020-01041-8 [Epub ahead of print].Search in Google Scholar

9. Robinson, AP, Hamann, JD. Forest analytics with R: an introduction. New York: Use R! Springer; 2010.10.1007/978-1-4419-7762-5_1Search in Google Scholar

10. Andrews, DF, Mallows, CL. Scale mixtures of normal distributions. J Roy Stat Soc B 1974;36:99–102. https://doi.org/10.1111/j.2517-6161.1974.tb00989.x.Search in Google Scholar

11. Isobe, T, Feigelson, ED, Akritas, MG, Babu, GJ. Linear regression in astronomy. Astrophys J 1990;364:104–13. https://doi.org/10.1086/169390.Search in Google Scholar

12. Jolicoeur, P. The multivariate generalization of the allometry equation. Biometrics 1963;19:497–9. https://doi.org/10.2307/2527939.Search in Google Scholar

13. Forbes, F, Wraith, D. A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: application to robust clustering. Stat Comput 2014;24:971–84. https://doi.org/10.1007/s11222-013-9414-4.Search in Google Scholar

14. Punzo, A, Bagnato, L. Allometric analysis using the multivariate shifted exponential normal distribution. Biom J 2020;62:1525–43. https://doi.org/10.1002/bimj.201900248.Search in Google Scholar PubMed

15. Taskinen, S, Warton, DI. Robust estimation and inference for bivariate line-fitting in allometry. Biom J 2011;53:652–72. https://doi.org/10.1002/bimj.201000018.Search in Google Scholar PubMed

16. Barndorff-Nielsen, O, Kent, J, Sørensen, M. Normal variance-mean mixtures and z distributions. Int Stat Rev 1982;50:145–59. https://doi.org/10.2307/1402598.Search in Google Scholar

17. Fang, KT, Kotz, S, Ng, KW. Symmetric multivariate and related distributions. Monographs on statistics and applied probability. U.S.A: Springer; 2013.Search in Google Scholar

18. Yamaguchi, K. Robust model and the EM algorithm. In: Watanabe, M, Yamaguchi, K, editors. The EM algorithm and related statistical models, statistics: a series of textbooks and monograph, chapter 4. New York: Marcel Dekker; 2004:37–64 pp.10.1201/9780203913055.ch4Search in Google Scholar

19. McLachlan, GJ, Peel, D. Finite mixture models. New York: John Wiley & Sons; 2000.10.1002/0471721182Search in Google Scholar

20. McNeil, A, Frey, R, Embrechts, P. Quantitative risk management: concepts, techniques and tools. Princeton series in finance. Princeton, New Jersey: Princeton University Press; 2005.Search in Google Scholar

21. Punzo, A, Bagnato, L. The multivariate tail-inflated normal distribution and its application in finance. J Stat Comput Simulat 2020 Aug 13. https://doi.org/10.1080/00949655.2020.1805451 [Epub ahead of print].Search in Google Scholar

22. Lange, KL, Little, RJA, Taylor, JMG. Robust statistical modeling using the t distribution. J Am Stat Assoc 1989;84:881–96. https://doi.org/10.2307/2290063.Search in Google Scholar

23. Kotz, S, Nadarajah, S. Multivariate t-distributions and their applications. Cambridge: Cambridge University Press; 2004.10.1017/CBO9780511550683Search in Google Scholar

24. Punzo, A, McNicholas, PD. Parsimonious mixtures of multivariate contaminated normal distributions. Biom J 2016;58:1506–37. https://doi.org/10.1002/bimj.201500144.Search in Google Scholar PubMed

25. Pfanzagl, J, Hamböker, R. Parametric statistical theory. De Gruyter textbook. Berlin: Walter de Gruyter; 1994.10.1515/9783110889765Search in Google Scholar

26. Misra, RD. On the stability of crystal lattices. ii. In: Mathematical proceedings of the Cambridge philosophical society. Cambridge: Cambridge University Press; 1940, vol 36:173–82 pp.10.1017/S030500410001714XSearch in Google Scholar

27. Abramowitz, M, Stegun, I. Handbook of mathematical functions: with formulas, graphs, and mathematical tables of applied mathematics series. New York: Dover Publications; 1965, vol 55.10.1115/1.3625776Search in Google Scholar

28. Tomarchio, SD, Punzo, A, Bagnato, L. Two new matrix-variate distributions with application in model-based clustering. Comput Stat Data Anal 2020;152:107050. https://doi.org/10.1016/j.csda.2020.107050.Search in Google Scholar

29. R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2018.Search in Google Scholar

30. Chen, H-J, Gnanadesikan, R, Kettenring, JR. Statistical methods for grouping corporations. Sankhyā Indian J Stat Ser B 1974;36:1–28.Search in Google Scholar

31. Devlin, SJ, Gnanadesikan, R, Kettenring, JR. Robust estimation of dispersion matrices and principal components. J Am Stat Assoc 1981;76:354–62. https://doi.org/10.1080/01621459.1981.10477654.Search in Google Scholar

32. Rousseeuw, PJ, Debruyne, M, Engelen, S, Hubert, M. Robustness and outlier detection in chemometrics. Crit Rev Anal Chem 2006;36:221–42. https://doi.org/10.1080/10408340600969403.Search in Google Scholar

33. Kriegel, H-P, Kröger, P, Schubert, E, Zimek, A. A general framework for increasing the robustness of pca-based correlation clustering algorithms. In: 20th international conference on scientific and statistical database management. Hong Kong, China: Springer; 2008:418–35 pp.10.1007/978-3-540-69497-7_27Search in Google Scholar

34. Warton, DI, Duursma, RA, Falster, DS, Taskinen, S. Smatr 3 – an R package for estimation and inference about allometric lines. Methods Ecol Evol 2012;3:257–9. https://doi.org/10.1111/j.2041-210x.2011.00153.x.Search in Google Scholar

35. Wooldridge, J. Introductory econometrics: a modern approach. ISE – International Student Edition. Mason, OH: Cengage Learning; 2008.Search in Google Scholar

36. Bagnato, L, Punzo, A, Zoia, MG. The multivariate leptokurtic-normal distribution and its application in model-based clustering. Can J Stat 2017;45:95–119. https://doi.org/10.1002/cjs.11308.Search in Google Scholar

37. Punzo, A, Mazza, A, McNicholas, PD. ContaminatedMixt: an R package for fitting parsimonious mixtures of multivariate contaminated normal distributions. J Stat Software 2018;85:1–25. https://doi.org/10.18637/jss.v085.i10.Search in Google Scholar

38. Luethi, D, Breymann, W. Ghyp: a package on generalized hyperbolic distribution and its special cases. Version 1.5.7 (2016-08-17); 2016.Search in Google Scholar

39. Akaike, H. A new look at the statistical model identification. IEEE Trans Automat Contr 1974;19:716–23. https://doi.org/10.1109/tac.1974.1100705.Search in Google Scholar

40. Schwarz, G. Estimating the dimension of a model. Ann Stat 1978;6:461–4. https://doi.org/10.1214/aos/1176344136.Search in Google Scholar

41. Flury, B. Flury: data sets from Flury, 1997. R package version 0.1-3; 2012.Search in Google Scholar

42. Flury, B. A first course in multivariate statistics. Springer texts in statistics. New York: Springer; 2013.Search in Google Scholar

43. Korkmaz, S, Goksuluk, D, Zararsiz, G. MVN: multivariate normality tests. R package version 5.6; 2019.Search in Google Scholar

44. Dempster, A, Laird, N, Rubin, D. Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 1977;39:1–38. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x.Search in Google Scholar

45. Melnykov, V, Zhu, X. On model-based clustering of skewed matrix data. J Multivariate Anal 2018;167:181–94. https://doi.org/10.1016/j.jmva.2018.04.007.Search in Google Scholar

46. Melnykov, V, Zhu, X. Studying crime trends in the USA over the years 2000–2012. Adv Data Anal Classif 2019;13:325–41. https://doi.org/10.1007/s11634-018-0326-1.Search in Google Scholar

47. Dagpunar, JS. Sampling of variates from a truncated gamma distribution. J Stat Comput Simulat 1978;8:59–64. https://doi.org/10.1080/00949657808810248.Search in Google Scholar

48. Philippe, A. Simulation of right and left truncated gamma distributions by mixtures. Stat Comput 1997;7:173–81. https://doi.org/10.1023/A:1018534102043.10.1023/A:1018534102043Search in Google Scholar

49. Coffey, CS, Muller, KE. Properties of doubly-truncated gamma variables. Commun Stat Theor Methods 2000;29:851–7. https://doi.org/10.1080/03610920008832519.Search in Google Scholar PubMed PubMed Central

50. Johnson, NL, Kotz, S. Continuous univariate distributions. New York: John Wiley & Sons; 1970, vol 1.Search in Google Scholar

51. Pawitan, Y. All likelihood: statistical modelling and inference using likelihood. Oxford: Oxford Science Publications. OUP Oxford; 2013.Search in Google Scholar

52. Boldea, O, Magnus, JR. Maximum likelihood estimation of the multivariate normal mixture model. J Am Stat Assoc 2009;104:1539–49. https://doi.org/10.1198/jasa.2009.tm08273.Search in Google Scholar

53. Basford, KE, Greenway, DR, McLachlan, GJ, Peel, D. Standard errors of fitted component means of normal mixtures. Comput Stat 1997;12:1–18.Search in Google Scholar

54. Gilbert, P, Varadhan, R. NumDeriv: accurate numerical derivatives. R package version 2016.8-1.1; 2019.Search in Google Scholar

Received: 2019-09-12
Accepted: 2020-12-07
Published Online: 2021-01-18

© 2021 Antonio Punzo and Luca Bagnato, published by De Gruyter, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 23.4.2024 from https://www.degruyter.com/document/doi/10.1515/ijb-2020-0059/html
Scroll to top button