1 Introduction—source of correlation

Transformation or correction of the original data system (y) may have an effect on the error model as the error component is also transformed. As a consequence of preliminary transformation the statistically independent measurement data may become more or less correlated. This correlation may have two basic reasons:

  1. 1.

    The transformation (e.g. correction) function may incorporate random variable(s), therefore all the transformed data will take in these variable(s) which will be the origin of the correlation. (E.g. Bouguer correction of gravity data embodies the Bouguer plate density or static correction of seismic data contains the equivalent upper layer velocity, shale effect correction in well logging interpretation requires the shale parameters etc.)

  2. 2.

    During the transformation more original data point are used for the calculation of one transformed data (convolution, DFT, interpolation etc.), hence the elements of transformed system also has a statistically common parts which also results data correlation.

Eventually the statistically common part causes the correlation between the elements of transformed data (z). The degree of this effect also depends on the applied error model. In this study the most widely-used error model was applied and examined: the zero mean, additive Gaussian noise (Δy ∈ N(0, σy)). Hence the measured and transformed data vectors:

$$\begin{aligned} {\mathbf{y}} & = {\mathbf{s}} + {\varvec{\Delta}}{\mathbf{y}} \\ {\mathbf{z}} & = T({\mathbf{y}}) \\ \end{aligned}$$
(1)

where s is the noiseless signal vector, and T(.) denotes the transformation. In general, this is a linear transformation in most of customary preliminary data processing steps.

$${\mathbf{z}} = T({\mathbf{y}}) = T({\mathbf{s}}) + T({\varvec{\Delta}}{\mathbf{y}})$$
(2)

The purpose of this paper is the analysis of the correlation effect in the inversion, which often appears on the preprocessed data, changing the covariance structure.

2 Correlated data inversion

The maximum likelihood or the Bayesian approach (if used consistently) automatically answers the data correlation problem. The joint probability density function (f) cannot be written in a form of product, as can be done in the case of independent data system. (Tarantola 1987; Szatmáry 2002) For the correlated zero mean Gaussian noise the well known likelihood-function is:

$$f\left( {z|C_{\Delta z} ,z_{0} \left( {\mathbf{p}} \right)} \right) = \frac{1}{{\left( {2\pi } \right)^{k/2} \left| {{\mathbf{C}}_{\Delta z} } \right|^{1/2} }}e^{{ - \frac{1}{2}\left( {{\mathbf{z}} - {\mathbf{z}}_{{\mathbf{0}}} \left( {\mathbf{p}} \right)} \right)^{T} {\mathbf{C}}_{\Delta z}^{ - 1} \left( {{\mathbf{z}} - {\mathbf{z}}_{{\mathbf{0}}} \left( {\mathbf{p}} \right)} \right)}}$$
(3)

where the CΔz is the covariance matrix of correlated Gaussian random vector variable z. The (z − z0) is the error term (Δz) for vector z and z0 is the centre of distribution; k is the dimension of measurement space. Here, T denotes the transpose of vectors and |CΔz| is the determinant of covariance matrix. The function in Eq. (3) is a conditional probability density function according to the parameters (p) and the error covariance matrix.

The quadratic form with inverse of the covariance matrix in the exponent will be the basis of metrics for the parameter fitting (the well-known Mahalanobis distance) (Mahalanobis 1936). The nonzero non diagonal elements in CΔz (normalized by standard deviations) indicate the measure of random dependence between the preprocessed random variables (elements of z). This matrix can be transformed into correlation matrix to study the correlation.

3 Case of correction

In the data evaluation process, the model of whole direct problem is often divided into two parts and the effect of geologically uninterested part is separated as a correction (Bouguer correction, seismic static correction etc.) in order to enhance the essential information. Sometimes, at the correction some random parameter is involved (like Bouguer plate density, upper zone velocity etc.) and after the correction, these random variables are built in the new corrected data set as a statistically common part (causing correlation on error term of corrected data).

3.1 Effect of subtractive correction

With a simple subtractive correction on a 1D noisy data system (y) will be demonstrated the effect of correlation resulting from a correction. The corrected data points are:

$$z_{i} = y_{i} - c\mu h_{i}$$
(4)

The c constant represents all non random multipliers, μ is the equivalent petrophysical properties for the correction (like the Bouguer plate density, eqivalant seismic velocity above the datum level etc.), h is the geometric parameter (e.g. thicknes of the layer) appears in the correction.

The error part of corrected data set (Δz) is composite, containing the independent measurement error vector (Δy), the error of correction parameter (Δμ) and the error of layer thicknesses (Δh). The error model is additive for all variables with the associated centers of distributions (si, h0,i, μ0):

$$\begin{gathered} y_{i} = s_{i} + \Delta y_{i} \hfill \\ h_{i} = h_{0,i} + \Delta h_{i} \hfill \\ \mu = \mu_{0} + \Delta \mu \hfill \\ \end{gathered}$$
(4a)

Then the detailed error term (Δz) of the corrected data vector (z) element is the following:

$$z_{i} = s_{i} + \Delta y_{i} - c\left( {\mu_{0} + \Delta \mu } \right)\left( {h_{0,i} + \Delta h_{i} } \right) = s_{i} + c\mu_{0} h_{0,i} + \Delta y_{i} - c\Delta \mu h_{0,i} - c\mu_{0} \Delta h_{i} - c\Delta \mu \Delta h_{i}$$
(5)

The last three error terms appear as a consequence of correction. To frame the covariance matrix with expectation value of Δz vector dyadics, the different error terms in Eq. (5) are considered independent and zero mean, so:

$${\text{E}}({\varvec{\Delta}}{\mathbf{z}}{\varvec{\Delta}}{\mathbf{ z}}^{T} ) = {\text{E}}({\varvec{\Delta}}{\mathbf{ y}}{\varvec{\Delta}}{\mathbf{y}}^{T} + c^{2} \Delta \mu^{2} {\mathbf{h}}_{0} {\mathbf{h}}_{0}^{T} + c^{2} \Delta \mu^{2} {\varvec{\Delta}}{\mathbf{ h}}{\varvec{\Delta}}{\mathbf{ h}}^{T} + c^{2} \mu_{0}^{2} {\varvec{\Delta}}{\mathbf{ h}}{\varvec{\Delta}}{\mathbf{h}}^{T} )$$
(6)

After taking the expectation value we get the covariance matrix of corrected data:

$${\mathbf{C}}_{\Delta z} = {\mathbf{C}}_{\Delta y} + c^{2} \sigma_{\mu }^{2} {\mathbf{h}}_{0} {\mathbf{h}}_{0}^{T} + c^{2} (\mu_{0}^{2} + \sigma_{\mu }^{2} ){\mathbf{C}}_{\Delta h}$$
(7)

Therefore the covariance matrix of corrected data is not diagonal because the second term of Eq. (7), which has dyadic form. The diagonal elements are also modified because the 3rd term of Eq. (7). To derive the functional to be minimized in the inversion, the inverse of CΔz is needed for the calculation of Mahalanobis distance.

This inverse—which is the metric tensor in the inversion functional—can be calculated by using Sherman–Morrison theorem. (Sherman and Morrison 1949) This theorem is for the derivation of inverse of special matrices which can be decomposed to the diagonal and dyadic part. By this theorem the metric tensor for Mahalanobis distance will be:

$${\mathbf{C}}_{\Delta z}^{ - 1} = \left( {\frac{1}{{\sigma_{y}^{2} + c^{2} \left( {\mu_{0}^{2} + \sigma_{\mu }^{2} } \right)\sigma_{h}^{2} }}} \right){\mathbf{I}} - \frac{1}{{1 + \left( {\frac{{c^{2} \sigma_{\mu }^{2} }}{{\sigma_{y}^{2} + c^{2} \left( {\mu_{0}^{2} + \sigma_{\mu }^{2} } \right)\sigma_{h}^{2} }}} \right){\mathbf{h}}_{0}^{T} {\mathbf{h}}_{0} }}\left( {\frac{{c^{2} \sigma_{\mu }^{2} }}{{\left( {\sigma_{y}^{2} + c^{2} \left( {\mu_{0}^{2} + \sigma_{\mu }^{2} } \right)\sigma_{h}^{2} } \right)^{2} }}} \right){\mathbf{h}}_{0} {\mathbf{h}}_{0}^{T}$$
(8)

where I is the identity matrix. The inverse of original y vector covariance was:

$${\mathbf{C}}_{\Delta y}^{ - 1} = \frac{1}{{\sigma_{y}^{2} }}{\mathbf{I}}$$
(9)

The importance of correlation correction [second part in Eq. (8)] is characterized by the following weight which depends on the ratio of error term variances:

$$\frac{{c^{2} \sigma_{\mu }^{2} }}{{\sigma_{y}^{2} + c^{2} \left( {\mu_{0}^{2} + \sigma_{\mu }^{2} } \right)\sigma_{h}^{2} }}$$
(10)

The decomposition derived above [Eq. (8)], shows how the diagonal variances is blurred, and how coupled the different measuring points in the exponent of formula 3.

3.2 Effect of correction, which is nonlinear function of random parameter

If the form of correction is not linear, the modified covariance matrix (CΔz) can be approximated by linearization. The corrected data vector elements in this case:

$$z_{i} = y_{i} + g(h_{i} ,\mu ) = y_{i} + g_{i}$$
(11)

where g(hi,μ) is the correction function. (E.g. the static correction in seismic preprocessing has this form, because it depends on the reciprocal of the upper zone velocity) Expressing Eq. (11) with the associated error parts:

$$\begin{aligned} z_{i} & = s_{i} + \Delta y_{i} + g(h_{0,i} + \Delta h_{i} ,\mu_{0} + \Delta \mu ) \approx s_{i} + g\left( {h_{0,i} ,\mu_{0} } \right) + \Delta y_{i} + \frac{{\partial g\left( {h_{0,i} ,\mu_{0} } \right)}}{\partial h}\Delta h_{i} + \frac{{\partial g\left( {h_{0,i} ,\mu_{0} } \right)}}{\partial \mu }\Delta \mu \\ \Delta z_{i} & = \Delta y_{i} + \frac{{\partial g\left( {h_{0,i} ,\mu_{0} } \right)}}{\partial h}\Delta h_{i} + \frac{{\partial g\left( {h_{0,i} ,\mu_{0} } \right)}}{\partial \mu }\Delta \mu \\ \end{aligned}$$
(12)

After taking the expectation value of the ΔzΔzT dyadics in the first order approximation [Eq. (12)], the CΔz will be:

$${\mathbf{C}}_{\Delta z} \approx {\mathbf{C}}_{\Delta y} + \sigma_{\ h}^{2} {\mathbf{C}}_{1} + \sigma_{\mu }^{2} {\mathbf{uu}}^{T}$$
(13)

where the elements of C1 diagonal matrix and u vector are the following:

$$\begin{aligned} C_{1,i,j} & = \delta_{i,j} \left( {\frac{{\partial g(h_{0,i} ,\mu_{0} )}}{\partial h}} \right)^{2} \\ u_{i} & = \frac{{\partial g(h_{0,i} ,\mu_{0} )}}{\partial \mu } \\ \end{aligned}$$
(14)

where δ is the Kronecker symbol. Using the Sherman–Morrison equation again, the inverse of CΔz:

$${\mathbf{C}}_{\Delta z}^{ - 1} = \left( {{\mathbf{C}}_{\Delta y} + \sigma_{\ h}^{2} {\mathbf{C}}_{1} } \right)^{ - 1} - \frac{{\left( {{\mathbf{C}}_{\Delta y} + \sigma_{\ h}^{2} {\mathbf{C}}_{1} } \right)^{{ - {\mathbf{1}}}} {\mathbf{uu}}^{T} \left( {{\mathbf{C}}_{\Delta y} + \sigma_{\ h}^{2} {\mathbf{C}}_{1} } \right)^{ - 1} \sigma_{\mu }^{2} }}{{1 + {\mathbf{u}}^{T} \left( {{\mathbf{C}}_{\Delta y} + \sigma_{\ h}^{2} {\mathbf{C}}_{1} } \right)^{ - 1} {\mathbf{u}}\sigma_{\mu }^{2} }}$$
(15)

The second part of Eq. (15) expresses the random dependence of corrected variables which may modify the results of inversion.

3.3 Effect of product type correction

Sometimes the correction has a product form, such as the seismic amplitude correction or the multiplication by sonde coefficient to transform the raw measured data into apparent values etc.

In the case of product type correction with random parameter, the form of corrected data vector element is the following:

$$\begin{aligned} z_{i} & = g(\mu_{0} + \Delta \mu )(s_{i} + \Delta y_{i} ) \approx \left( {g(\mu_{0} ) + \frac{{\partial g(\mu_{0} )}}{\partial \mu }\Delta \mu } \right)(s_{i} + \Delta y_{i} ) \\ & = g(\mu_{0} )s_{i} + g(\mu_{0} )\Delta y_{i} + \frac{{\partial g(\mu_{0} )}}{\partial \mu }s_{i} \Delta \mu + \frac{{\partial g(\mu_{0} )}}{\partial \mu }\Delta y_{i} \Delta \mu \\ \end{aligned}$$
(16)

where (g(μ)) is now the correction function (with random parameter). The covariance matrix of the error part:

$${\mathbf{C}}_{\Delta z} = \left( {g(\mu_{0} )} \right)^{2} {\mathbf{C}}_{\Delta y} + \sigma_{\mu }^{2} \left( {\frac{{\partial g(\mu_{0} )}}{\partial \mu }} \right)^{2} {\mathbf{ss}}^{T} + \left( {\frac{{\partial g(\mu_{0} )}}{\partial \mu }} \right)^{2} \sigma_{\mu }^{2} {\mathbf{C}}_{\Delta y}$$
(17)

Using the Sherman-Morrison formula to determine the inverse matrix:

$${\mathbf{C}}_{\Delta z}^{ - 1} = \frac{1}{{\left( {g(\mu_{0} )} \right)^{2} + \left( {\frac{{\partial g(\mu_{0} )}}{\partial \mu }} \right)^{2} \sigma_{\mu }^{2} }}{\mathbf{C}}_{\Delta y}^{ - 1} - \frac{{\frac{{\sigma_{\mu }^{2} \left( {\frac{{\partial g(\mu_{0} )}}{\partial \mu }} \right)^{2} }}{{\left( {\left( {g(\mu_{0} )} \right)^{2} + \left( {\frac{{\partial g(\mu_{0} )}}{\partial \mu }} \right)^{2} \sigma_{\mu }^{2} } \right)^{2} }}}}{{1 + \frac{{\sigma_{\mu }^{2} \left( {\frac{{\partial g(\mu_{0} )}}{\partial \mu }} \right)^{2} }}{{\left( {\left( {g(\mu_{0} )} \right)^{2} + \left( {\frac{{\partial g(\mu_{0} )}}{\partial \mu }} \right)^{2} \sigma_{\mu }^{2} } \right)}}{\mathbf{s}}^{{\mathbf{T}}} {\mathbf{s}}}}{\mathbf{s}} {\mathbf{s}}^{T}$$
(18)

This inverse matrix is also separated into diagonal and dyadic part like before. During the estimation procedure s vector can be approximated by y vector.

4 Case of linear transformation

Typical preliminary phase operation in the data processing is the convolutional filtering or filtering in spectral range after Discrete Fourier Transformation (DFT). In general, the linear transformation can be represented with a transformation matrix (F). The transformed data vector:

$${\mathbf{z}} = {\mathbf{Fy}} = {\mathbf{Fs}} + {\mathbf{F}}{\varvec{\Delta }}y = {\mathbf{z}}_{{\mathbf{0}}} + {\varvec{\Delta}}{\mathbf{ z}}$$
(19)

The covariance matrix of data vector z which was obtained by linear transformation of originally independent (and uniform variance) data (y):

$${\mathbf{C}}_{{\Delta {\mathbf{z}}}} = {\text{E}}\left( {{\varvec{\Delta}}{\mathbf{ z}}{\varvec{\Delta}}{\mathbf{ z}}^{{\text{T}}} } \right) = {\text{E}}\left( {{\mathbf{F}}\Delta y\Delta y^{T} {\mathbf{F}}^{T} } \right) = {\mathbf{FC}}_{\Delta y} {\mathbf{F}}^{T} = \sigma_{\Delta y}^{2} {\mathbf{FF}}^{T}$$
(20)

The structure of matrix FFT determines the covariance structure which reflects the possible statistical dependence of vector z elements. Substituting the CΔz [Eq. (20)] into the Mahalanobis distance formula:

$$\left( {z - z_{0} \left( {\mathbf{p}} \right)} \right)^{T} {\mathbf{C}}_{{\Delta {\mathbf{z}}}}^{ - 1} \left( {z - z_{0} \left( {\mathbf{p}} \right)} \right) = {\varvec{\Delta}}{\mathbf{ z}}^{T} {\mathbf{C}}_{{\Delta {\mathbf{z}}}}^{ - 1} {\varvec{\Delta}}{\mathbf{ z}} = {\varvec{\Delta}}{\mathbf{ y}}^{T} {\mathbf{F}}^{T} \left( {{\mathbf{FC}}_{\Delta y} {\mathbf{F}}^{T} } \right)^{ - 1} {\mathbf{F}}{\varvec{\Delta }}y = {\varvec{\Delta}}{\mathbf{ y}}^{T} {\mathbf{C}}_{\Delta y}^{ - 1} {{\varvec{\Delta}}}y$$
(21)

It can be seen [Eq. (21)], that using the proper Mahalanobis metrics (taking into account the possible data correlation), then the criteria of parameter fitting doesn’t changed. So the parameter variances (estimated in the inversion) also remain unchanged. It shows, that the applied linear transformation doesn’t increase the information content of dataset and may not reduce the Cramér–Rao bound for the estimated parameter variance (Cramér 1946; Rao 1945).

If the “traditional” simple least square estimation had been used—neglecting the correlation effect—we would get a lower parameter variance violating the Cramér–Rao bound. (While, the proper maximum likelihood estimation is known to be asymptotically efficient, that is the variance of estimated parameter reaches the Cramer–Rao bound asymptotically).

4.1 Covariance of filtered data

During the preliminary data processing the convolutional filtering is widely applied for noise suppression or extraction of low frequency trend or high frequency local effect etc. This procedure may also be the source of noise correlation since the filtered data elements are the composition of more raw measurement data, hence their error are inherited. In the case of additive noise, the filtered data (z) and their decomposition can be expressed:

$$z_{k} = \sum\limits_{i = - N}^{N} {w_{i} y_{i - k} } = \sum\limits_{i = - N}^{N} {w_{i} s_{i - k} } + \sum\limits_{i = - N}^{N} {w_{i} \Delta y_{i - k} }$$
(22)

were w is the vector of convolution weight in the 2 N + 1 points length filtering window. In the simpler matrix form (matrix W was made up of filter weights w):

$${\mathbf{z}} = {\mathbf{Wy}} = {\mathbf{Ws}} + {\mathbf{W}}{\varvec{\Delta }}y$$
(23)

The noise covariance matrix [in a same form as in Eq. (20)]:

$${\mathbf{C}}_{{\Delta {\mathbf{z}}}} = {\text{E}}\left[ {{\mathbf{W}}{\varvec{\Delta }}y{\varvec{\Delta}}{\mathbf{ y}}^{T} {\mathbf{W}}^{T} } \right] = {\mathbf{WC}}_{{\Delta {\mathbf{y}}}} {\mathbf{W}}^{T}$$
(24)

In this case the CΔz is a band matrix, which may have nonzero non-diagonal elements. The diagonal elements of CΔy are partially distributed among the non-diagonal elements according to the w vector. Therefore the trace of covariance matrix which represents the overall variances of data vector decreases as a consequence of filtering (compared to the raw data system variance).

Take the case of 3-step moving average (MA) filter as an example, the transformation matrix (W) element is the following:

$$W_{ik} = \frac{1}{3}\left( {\delta_{i,k - 1} + \delta_{i,k} + \delta_{i,k + 1} } \right)$$
(25)

Using Eq. (24), the covariance matrix element of MA filtered data:

$$C_{\Delta z,ik} = \frac{{\sigma_{\Delta y}^{2} }}{9}\left( {\delta_{i,k - 2} + 2\delta_{i,k - 1} + 3\delta_{i,k} + 2\delta_{i,k + 1} + \delta_{i,k + 2} } \right)$$
(26)

It can be seen, that the trace of filtered data covariance matrix is one-third of original covariance matrix trace. Although the overall data variance decreases after filtering, the functional for the parameter estimation (the Mahalanobis distance) is not changed, as it was shown in Eq. (21), if the possible data correlation is also considered.

4.2 Covariance matrix of Fourier transformed data

The Discrete Fourier Transformation (DFT) is a frequently used tool in the preprocessing. It also can be described in matrix form, so the transformed data vector:

$${\mathbf{z}} = {\mathbf{F}}_{DFT} {\mathbf{y}} = {\mathbf{F}}_{DFT} {\mathbf{s}} + {\mathbf{F}}_{DFT} {{\varvec{\Delta}}}y$$
(27)

The element of FDFT is:

$$F_{DFT,m,n} = \exp \left( {\frac{ - i2\pi mn}{N}} \right)$$
(28)

where N is the number of data. Since the results of transformation is complex, therefore the derivation of covariance matrix should be modified accordingly, using the conjugate of DFT matrix \(\left( {\overline{{{\mathbf{F}}_{DFT} }} } \right)\):

$${\mathbf{C}}_{{\Delta {\text{z}}}} = {\text{E}}\left( {{\mathbf{F}}_{DFT} {\varvec{\Delta}}{\mathbf{ y}}{\varvec{\Delta}}{\mathbf{ y}}^{T} \left( {\overline{{{\mathbf{F}}_{DFT} }} } \right)^{T} } \right)$$
(29)

The DFT matrix has unitary symmetry, that is, the transposed its conjugate is also its inverse. Hence the transformed data covariance matrix remains diagonal, so the conventional least square method is still applicable in spectral range.

5 Summary

The covariance structure of initial data system is essential for the proper implementation of inversion. Although, previously various preprocessing steps are often performed, which may change the statistical properties of data; this is sometimes ignored during the inversion. In this paper the noise covariance of transformed data was studied.

In the first part the different type of correction is examined (subtractive, multiplicative, nonlinear), where the correction function contains random variables. It was shown, that in this case, the basic functional of inversion must be modified, because of the changes in covariance structure. For the mentioned cases, the covariance matrix of corrected data was generated and its inverse too which is necessary for the likelihood function.

In the second part, the covariance structure of linearly transformed data was examined, studying the consequence of transformation.