Skip to main content
Log in

Covariance matrix estimation of the maximum likelihood estimator in multivariate clusterwise linear regression

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

The expectation-maximisation algorithm is employed to perform maximum likelihood estimation in a wide range of situations, including regression analysis based on clusterwise regression models. A disadvantage of using this algorithm is that it is unable to provide an assessment of the sample variability of the maximum likelihood estimator. This inability is a consequence of the fact that the algorithm does not require deriving an analytical expression for the Hessian matrix, thus preventing from a direct evaluation of the asymptotic covariance matrix of the estimator. A solution to this problem when performing linear regression analysis through a multivariate Gaussian clusterwise regression model is developed. Two estimators of the asymptotic covariance matrix of the maximum likelihood estimator are proposed. In practical applications their use makes it possible to avoid resorting to bootstrap techniques and general purpose mathematical optimisers. The performances of these estimators are evaluated in analysing small simulated and real datasets; the obtained results illustrate their usefulness and effectiveness in practical applications. From a theoretical point of view, under suitable conditions, the proposed estimators are shown to be consistent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Aitkin M, Tunnicliffe Wilson G (1980) Mixture models, outliers, and the EM algorithm. Technometrics 22:325–331

    MATH  Google Scholar 

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) Second international symposium on information theory. Akademiai Kiado, pp 267–281

  • Arminger G, Stein P, Wittenberg J (1999) Mixtures of conditional mean and covariance structure models. Psychometrika 64:475–494

    MATH  Google Scholar 

  • Baird IG, Quastel N (2011) Dolphin-safe tuna from California to Thailand: localisms in environmental certification of global commodity networks. Ann Assoc Am Geogr 101:337–355

    Google Scholar 

  • Basford KE, Greenway DR, McLachlan GJ, Peel D (1997) Standard errors of fitted means under normal mixture models. Comput Stat 12:1–17

    MATH  Google Scholar 

  • Benaglia T, Chauveau D, Hunter DR, Young D (2009) mixtools: an R package for analyzing finite mixture models. J Stat Softw 32(6):1–29

    Google Scholar 

  • Boiteau G, Singh M, Singh RP, Tai GCC, Turner TR (1998) Rate of spread of PVY-n by alate Myzus persicae (Sulzer) from infected to healthy plants under laboratory conditions. Potato Res 41:335–344

    Google Scholar 

  • Boldea O, Magnus JR (2009) Maximum likelihood estimation of the multivariate normal mixture model. J Am Stat Assoc 104:1539–1549

    MathSciNet  MATH  Google Scholar 

  • Bowden R (1973) The theory of parametric identification. Econometrica 41:1069–1074

    MathSciNet  MATH  Google Scholar 

  • Chevalier JA, Kashyap AK, Rossi PE (2003) Why don’t prices rise during periods of peak demand? Evidence from scanner data. Am Econ Rev 93:15–37

    Google Scholar 

  • Dang UJ, McNicholas PD (2015) Families of parsimonious finite mixtures of regression models. In: Morlini I, Minerva T, Vichi M (eds) Advances in statistical models for data analysis. Springer, Cham, pp 73–84

    Google Scholar 

  • Dang UJ, Punzo A, McNicholas PD, Ingrassia S, Browne RP (2017) Multivariate response and parsimony for Gaussian cluster-weighted models. J Classif 34(1):4–34

    MathSciNet  MATH  Google Scholar 

  • Dayton CM, Macready GB (1988) Concomitant-variable latent-class models. J Am Stat Assoc 83:173–178

    MathSciNet  Google Scholar 

  • Ding C (2006) Using regression mixture analysis in educational research. Pract Assess Res Eval 11:1–11

    Google Scholar 

  • Dyer WJ, Pleck J, McBride B (2012) Using mixture regression to identify varying effects: a demonstration with paternal incarceration. J Marriage Fam 74:1129–1148

    Google Scholar 

  • Elhenawy M, Rakha H, Chen H (2017) An automatic traffic congestion identification algorithm based on mixture of linear regressions. In: Helfert M, Klein C, Donnellan B, Gusikhin O (eds) Smart cities, green technologies, and intelligent transport systems. Springer, Cham, pp 242–256

    Google Scholar 

  • Fair RC, Jaffe DM (1972) Methods of estimation for markets in disequilibrium. Econometrica 40:497–514

    Google Scholar 

  • Faria S, Soromenho G (2010) Fitting mixtures of linear regressions. J Stat Comput Simul 80:201–225

    MathSciNet  MATH  Google Scholar 

  • Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York

    MATH  Google Scholar 

  • Galimberti G, Scardovi E, Soffritti G (2016) Using mixtures in seemingly unrelated linear regression models. Stat Comput 26:1025–1038

    MathSciNet  MATH  Google Scholar 

  • García-Escudero LA, Gordaliza A, Mayo-Iscar A, San Martín R (2010) Robust clusterwise linear regression through trimming. Comput Stat Data Anal 54(12):3057–3069

    MathSciNet  MATH  Google Scholar 

  • García-Escudero LA, Gordaliza A, Greselin F, Ingrassia S, Mayo-Iscar A (2017) Robust estimation of mixtures of regressions with random covariates, via trimming and constraints. Stat Comput 27:377–402

    MathSciNet  MATH  Google Scholar 

  • Grün B, Leisch F (2008) FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J Stat Softw 28(4):1–35

    Google Scholar 

  • Hennig C (2000) Identifiability of models for clusterwise linear regression. J Classif 17:273–296

    MathSciNet  MATH  Google Scholar 

  • Hosmer DW (1974) Maximum likelihood estimates of the parameters of a mixture of two regression lines. Commun Stat A Theory Methods 3:995–1006

    MATH  Google Scholar 

  • Ingrassia S, Punzo A (2016) Decision boundaries for mixtures of regressions. J Korean Stat Soc 45:295–306

    MathSciNet  MATH  Google Scholar 

  • Jones PN, McLachlan GJ (1992) Fitting finite mixture models in a regression context. Austr J Stat 34:233–240

    Google Scholar 

  • Kamakura W (1988) A least squares procedure for benefit segmentation with conjoint experiments. J Mark Res 25:157–167

    Google Scholar 

  • Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86

    MathSciNet  MATH  Google Scholar 

  • Lamont AE, Vermunt JK, Van Horn ML (2016) Regression mixture models: does modeling the covariance between independent variables and latent classes improve the results? Multivar Behav Res 51:35–52

    Google Scholar 

  • Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc B 44:226–233

    MathSciNet  MATH  Google Scholar 

  • Magnus JR, Neudecker H (1988) Matrix differential calculus with applications in statistics and econometrics. Wiley, New York

    MATH  Google Scholar 

  • Maugis C, Celeux G, Martin-Magniette ML (2009) Variable selection for clustering with Gaussian mixture models. Biometrics 65:701–709

    MathSciNet  MATH  Google Scholar 

  • Mazza A, Punzo A (2017) Mixtures of multivariate contaminated normal regression models. Stat Pap. https://doi.org/10.1007/s00362-017-0964-y

    Article  MATH  Google Scholar 

  • Mazza A, Punzo A, Ingrassia S (2018) flexCWM: a flexible framework for cluster-weighted models. J Stat Softw 86(2):1–30

    Google Scholar 

  • McDonald SE, Shin S, Corona R et al (2016) Children exposed to intimate partner violence: identifying differential effects of family environment on children’s trauma and psychopathology symptoms through regression mixture models. Child Abuse Negl 58:1–11

    Google Scholar 

  • McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

    MATH  Google Scholar 

  • Meilijson I (1989) A fast improvement to the EM algorithm on its own terms. J R Stat Soc B 51:127–138

    MathSciNet  MATH  Google Scholar 

  • Newton MA, Raftery AE (1994) Approximate Bayesian inference with the weighted likelihood bootstrap (with discussion). J R Stat Soc B 56:3–48

    MATH  Google Scholar 

  • R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org

  • Rossi P (2019) bayesm: Bayesian inference for marketing/micro-econometrics. R package version 3.1-4. https://CRAN.R-project.org/package=bayesm

  • Schott JR (2005) Matrix analysis for statistics, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    MathSciNet  MATH  Google Scholar 

  • Städler N, Bühlmann P, van de Geer S (2010) \(\ell \)1-penalization for mixture regression models. Test 19:209–256

    MathSciNet  MATH  Google Scholar 

  • Tang Q, Karunamuni RJ (2013) Minimum distance estimation in a finite mixture regression model. J Multivar Anal 120:185–204

    MathSciNet  MATH  Google Scholar 

  • Tashman A, Frey RJ (2009) Modeling risk in arbitrage strategies using finite mixtures. Quant Finance 9:495–503

    MathSciNet  MATH  Google Scholar 

  • Turner TR (2000) Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions. Appl Stat 49:371–384

    MathSciNet  MATH  Google Scholar 

  • Turner TR (2014) mixreg: functions to fit mixtures of regressions. http://CRAN.R-project.org/package=mixreg. Accessed 11 Jan 2019

  • Van Horn ML, Jaki T, Masyn K et al (2015) Evaluating differential effects using regression interactions and regression mixture models. Educ Psychol Meas 75:677–714

    Google Scholar 

  • Wedel M (2002) Concomitant variables in finite mixture models. Stat Neerl 56:362–375

    MathSciNet  MATH  Google Scholar 

  • White H (1982) Maximum likelihood estimation of misspecified models. Econometrica 50:1–25

    MathSciNet  MATH  Google Scholar 

  • Yao F, Fu Y, Lee TCM (2011) Functional mixture regression. Biostatistics 12:341–353

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giuliano Galimberti.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 96 KB)

Appendices

Appendix 1

1.1 Proof of Theorem 1

Using Eqs. (3) and (6) it is possible to write \(l({\varvec{\theta }})=\sum _{i=1}^{I} l_i({\varvec{\theta }})\), where \(l_i({\varvec{\theta }})=\log (\sum _{k=1}^{K}f_{ki})\). By exploiting the result given by equation (A.1) in Boldea and Magnus (2009), the first order differential of \(l({\varvec{\theta }})\) is equal to

$$\begin{aligned} {\mathrm d}l({\varvec{\theta }})=\sum _{i=1}^{I}{\mathrm d}l_i({\varvec{\theta }})=\sum _{i=1}^{I} \left( \sum _{k=1}^{K} \alpha _{ki}{\mathrm d}\log f_{ki}\right) , \end{aligned}$$
(18)

where \(\alpha _{ki}\) is defined in Eq. (7). \({\mathrm d}\log f_{ki}\), the first order differential of \(\log f_{ki}\), is equal to (see “Appendix 3”)

$$\begin{aligned} {\mathrm d}{\log f_{ki}}&=\left( {\mathrm d}\varvec{\pi }\right) '{\mathbf {a}}_{k}+\left( {\mathrm d}\varvec{\gamma }_{k}\right) '{\mathbf {b}}_{ki} +\left[ {\mathrm d}{\mathrm {vec}}(\varvec{\varPi }'_{k})\right] '{\mathrm {vec}}\left( {\mathbf {x}}_{i}{\mathbf {b}}'_{ki}\right) + \\&\quad -\frac{1}{2}\left[ {\mathrm d}{\mathrm v}({\varvec{\varSigma }}_{k})\right] '{\mathbf {G}}'{\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) , \end{aligned}$$
(19)

where \(\varvec{a}_{k}\), \({\mathbf {b}}_{ki}\) and \({\mathbf {B}}_{ki}\) are defined in Eqs. (8), (9) and (10), respectively.

Inserting Eq. (19) in Eq. (18) gives

$$\begin{aligned} {\mathrm d}{l({\varvec{\theta }})}&=\left( {\mathrm d}\varvec{\pi }\right) '\sum _{i=1}^{I}\sum _{k=1}^{K}\alpha _{ki}{\mathbf {a}}_{k} + \sum _{k=1}^{K}\left( {\mathrm d}\varvec{\gamma }_{k}\right) '\sum _{i=1}^{I}\alpha _{ki}{\mathbf {b}}_{ki}+ \\&\quad +\sum _{k=1}^{K}\left[ {\mathrm d}{\mathrm {vec}}(\varvec{\varPi }'_{k})\right] '\sum _{i=1}^{I}\alpha _{ki}{\mathrm {vec}}\left( {\mathbf {x}}_{i}{\mathbf {b}}'_{ki}\right) + \\ &\quad -\frac{1}{2}\sum _{k=1}^{K}\left[ {\mathrm d}{\mathrm v}({\varvec{\varSigma }}_{k})\right] '\sum _{i=1}^{I}\alpha _{ki}{\mathbf {G}}'{\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) . \end{aligned}$$

Taking the derivatives with respect to \(\varvec{\pi }\), \(\varvec{\gamma }_{k}\), \({\mathrm {vec}}(\varvec{\varPi }'_{k})\) and \({\mathrm v}({\varvec{\varSigma }}_{k})\) completes the proof.

Appendix 2

2.1 Proof of Theorem 2

The second order differential of \(l({\varvec{\theta }})\) is given by

$$\begin{aligned} {\mathrm d}^2{l\left( {\varvec{\theta }}\right) }=\sum _{i=1}^{I} {\mathrm d}^2 l_i\left( {\varvec{\theta }}\right) , \end{aligned}$$

where

$$\begin{aligned} {\mathrm d}^2 l_i\left( {\varvec{\theta }}\right) = \sum _{k=1}^K \alpha _{ki}\mathrm {d}^2 \log f_{ki}+\sum _{k=1}^K \alpha _{ki}\left( \mathrm {d} \log f_{ki}\right) ^2-\left( \sum _{k=1}^K \alpha _{ki}\mathrm {d} \log f_{ki}\right) ^2 \end{aligned}$$
(20)

(see equation (A.2) in Boldea and Magnus 2009).

Equation (36) gives an expression for \({\mathrm d}^2{\log f_{ki}}\), the second order differential of \(\log f_{ki}\). Furthermore, expressions for \(\left( {\mathrm d}\log f_{ki}\right) ^2\) and \(\left( \sum _{k=1}^K\alpha _{ki} {\mathrm d}\log f_{ki}\right) ^2\) can be obtained, after some algebra, by noting that:

$$\begin{aligned} \left( {\mathrm d}\log f_{ki}\right) ^2&= \left( {\mathrm d}\log f_{ki}\right) '\left( {\mathrm d}\log f_{ki}\right) ,\\ \left( \sum _{k=1}^K\alpha _{ki}{\mathrm d}\log f_{ki}\right) ^2& = \left( \sum _{k=1}^K \alpha _{ki} {\mathrm d}\log f_{ki}\right) ' \left( \sum _{k=1}^K\alpha _{ki}{\mathrm d}\log f_{ki}\right) , \end{aligned}$$

and by exploiting the result for \({\mathrm d}\log f_{ki}\) given in Eq. (19). This results in:

$$\begin{aligned} \left( {\mathrm d}\log f_{ki}\right) ^2&=\left( {\mathrm d}\varvec{\pi }\right) '{\mathbf {a}}_k{\mathbf {a}}'_k{\mathrm d}\varvec{\pi } + \left( {\mathrm d}\varvec{\pi }\right) '{\mathbf {a}}_k {\mathbf {b}}'_{ki}{\mathrm d}\varvec{\gamma }_{k} + \\ &\quad +\left( {\mathrm d}\varvec{\pi }\right) '{\mathbf {a}}_k\left[ {\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{ki}\right) \right] '\left( {\mathrm d}{\mathrm {vec}}(\varvec{\varPi }'_{k})\right) +\\ &\quad -\frac{1}{2}\left( {\mathrm d}\varvec{\pi }\right) '{\mathbf {a}}_k\left[ {\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) \right] '{\mathbf {G}}{\mathrm d}{\mathrm v}({\varvec{\varSigma }}_{k}) +\left( {\mathrm d}\varvec{\gamma }_{k}\right) '{\mathbf {b}}_{ki}{\mathbf {a}}'_k{\mathrm d}\varvec{\pi }+\\ &\quad +\left( {\mathrm d}\varvec{\gamma }_{k}\right) '{\mathbf {b}}_{ki}{\mathbf {b}}'_{ki}{\mathrm d}\varvec{\gamma }_{k}+\\ &\quad +\left( {\mathrm d}\varvec{\gamma }_{k}\right) '{\mathbf {b}}_{ki}\left[ {\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{ki}\right) \right] '{\mathrm d}{\mathrm {vec}}(\varvec{\varPi }'_{k})+\\ &\quad - \frac{1}{2} \left( {\mathrm d}\varvec{\gamma }_{k}\right) '{\mathbf {b}}_{ki}\left[ {\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) \right] '{\mathbf {G}}{\mathrm d}{\mathrm v}({\varvec{\varSigma }}_{k})+\\ &\quad +\left[ {\mathrm d}{\mathrm {vec}}\varvec{\varPi }'_{k}\right] '{\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{ki}\right) {\mathbf {a}}'_k{\mathrm d}\varvec{\pi }+\\ &\quad +\left[ {\mathrm d}{\mathrm {vec}}(\varvec{\varPi }'_{k})\right] '{\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{ki}\right) {\mathbf {b}}'_{ki}{\mathrm d}\varvec{\gamma }_{k} +\\ &\quad +\left[ {\mathrm d}{\mathrm {vec}}(\varvec{\varPi }'_{k})\right] '{\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{ki}\right) \left[ {\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{ki}\right) \right] '{\mathrm d}{\mathrm {vec}}(\varvec{\varPi }'_{k})+\\ &\quad -\frac{1}{2}\left[ {\mathrm d}{\mathrm {vec}}(\varvec{\varPi }'_{k})\right] '{\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{ki}\right) \left[ {\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) \right] '{\mathbf {G}}{\mathrm d}{\mathrm v}({\varvec{\varSigma }}_{k})+\\ &\quad -\frac{1}{2}\left[ {\mathrm d}{\mathrm v}({\varvec{\varSigma }}_{k})\right] '{\mathbf {G}}'{\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) {\mathbf {a}}'_k{\mathrm d}\varvec{\pi }+\\ &\quad -\frac{1}{2}\left[ {\mathrm d}{\mathrm v}({\varvec{\varSigma }}_{k})\right] '{\mathbf {G}}'{\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) {\mathbf {b}}'_{ki}{\mathrm d}\varvec{\gamma }_{k}+\\ &\quad -\frac{1}{2}\left[ {\mathrm d}{\mathrm v}({\varvec{\varSigma }}_{k})\right] '{\mathbf {G}}'{\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) \left[ {\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{ki}\right) \right] '{\mathrm d}{\mathrm {vec}}(\varvec{\varPi }'_{k})+\\&\quad +\frac{1}{4}\left[ {\mathrm d}{\mathrm v}({\varvec{\varSigma }}_{k})\right] '{\mathbf {G}}'{\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) \left[ {\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) \right] '{\mathbf {G}}{\mathrm d}{\mathrm v}({\varvec{\varSigma }}_{k}), \end{aligned}$$
(21)
$$\begin{aligned} \left( \sum _{k=1}^K\alpha _{ki}{\mathrm d}\ln f_{ki}\right) ^2&= \left( {\mathrm d}\varvec{\pi }\right) '\bar{{\mathbf {a}}}_i\bar{{\mathbf {a}}}'_i{\mathrm d}\varvec{\pi }+ \left( {\mathrm d}\varvec{\pi }\right) '\bar{{\mathbf {a}}}_i\left( \sum _{k=1}^K\alpha _{ki}{\mathbf {b}}'_{ki}{\mathrm d}\varvec{\gamma }_{k}\right) +\\ &\quad +\left( {\mathrm d}\varvec{\pi }\right) '\bar{{\mathbf {a}}}_i\left\{ \sum _{k=1}^K\alpha _{ki}\left[ {\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{ki}\right) \right] '\left[ {\mathrm d}{\mathrm {vec}}(\varvec{\varPi }'_{k})\right] \right\} +\\ &\quad - \frac{1}{2} \left( {\mathrm d}\varvec{\pi }\right) '\bar{{\mathbf {a}}}_i\left\{ \sum _{k=1}^K\alpha _{ki}\left[ {\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) \right] '{\mathbf {G}}{\mathrm d}{\mathrm v}\left( {\varvec{\varSigma }}_{k}\right) \right\} + \\ &\quad +\left[ \sum _{k=1}^K\left( {\mathrm d}\varvec{\gamma }_{k}\right) '\alpha _{ki}{\mathbf {b}}_{ki}\right] \bar{{\mathbf {a}}}'_i{\mathrm d}\varvec{\pi } +\sum _{k=1}^K\sum _{l=1}^K\left( {\mathrm d}\varvec{\gamma }_{k}\right) '\alpha _{ki}\alpha _{li}{\mathbf {b}}_{ki}{\mathbf {b}}'_{li}{\mathrm d}\varvec{\gamma }_{l} + \\ &\quad +\sum _{k=1}^K\sum _{l=1}^K\left( {\mathrm d}\varvec{\gamma }_{k}\right) '\alpha _{ki}\alpha _{li}{\mathbf {b}}_{ki}\left[ {\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}_{li}'\right) \right] '{\mathrm d}{\mathrm {vec}}(\varvec{\varPi }'_{l})+\\ &\quad - \frac{1}{2}\sum _{k=1}^K\sum _{l=1}^K\left( {\mathrm d}\varvec{\gamma }_{k}\right) '\alpha _{ki}\alpha _{li}{\mathbf {b}}_{ki}\left[ {\mathrm {vec}}\left( {\mathbf {B}}_{li}\right) \right] '{\mathbf {G}}{\mathrm d}{\mathrm v}({\varvec{\varSigma }}_{l})+ \\ &\quad +\left[ \sum _{k=1}^K\left[ {\mathrm d}{\mathrm {vec}}(\varvec{\varPi }'_{k})\right] '\alpha _{ki}{\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{ki}\right) \right] \bar{{\mathbf {a}}}'_i{\mathrm d}\varvec{\pi }+\\ &\quad + \sum _{k=1}^K\sum _{l=1}^K\left[ {\mathrm d}{\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) \right] '\alpha _{ki}\alpha _{li}{\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{ki}\right) {\mathbf {b}}'_{li}{\mathrm d}\varvec{\gamma }_{l} + \\ &\quad +\sum _{k=1}^K\sum _{l=1}^K\left[ {\mathrm d}{\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) \right] '\alpha _{ki}\alpha _{li}{\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{ki}\right) \left[ {\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{li}\right) \right] '{\mathrm d}{\mathrm {vec}}(\varvec{\varPi }'_{l})+\\ &\quad - \frac{1}{2} \sum _{k=1}^K\sum _{l=1}^K\left[ {\mathrm d}{\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) \right] '\alpha _{ki}\alpha _{li}{\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{ki}\right) \left[ {\mathrm {vec}}\left( {\mathbf {B}}_{li}\right) \right] '{\mathbf {G}}{\mathrm d}{\mathrm v}({\varvec{\varSigma }}_{l})+\\ &\quad -\frac{1}{2}\left[ \sum _{k=1}^K\left[ {\mathrm d}{\mathrm v}\left( {\varvec{\varSigma }}_{k}\right) \right] '\alpha _{ki}{\mathbf {G}}'{\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) \right] \bar{{\mathbf {a}}}'_i{\mathrm d}\varvec{\pi } + \\ &\quad -\frac{1}{2}\sum _{k=1}^K\sum _{l=1}^K\left[ {\mathrm d}{\mathrm v}\left( {\varvec{\varSigma }}_{k}\right) \right] '\alpha _{ki}\alpha _{li}{\mathbf {G}}'{\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) {\mathbf {b}}'_{li}{\mathrm d}\varvec{\gamma }_{l}+\\ &\quad -\frac{1}{2}\sum _{k=1}^K\sum _{l=1}^K\left[ {\mathrm d}{\mathrm v}\left( {\varvec{\varSigma }}_{k}\right) \right] '\alpha _{ki}\alpha _{li}{\mathbf {G}}'{\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) \left[ {\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{li}\right) \right] '{\mathrm d}{\mathrm {vec}}\left( \varvec{\varPi }'_{l}\right) + \\&\quad +\frac{1}{4}\sum _{k=1}^K\sum _{l=1}^K\left[ {\mathrm d}{\mathrm v}\left( {\varvec{\varSigma }}_{k}\right) \right] '\alpha _{ki}\alpha _{li}{\mathbf {G}}'{\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) \left[ {\mathrm {vec}}\left( {\mathbf {B}}_{li}\right) \right] '{\mathbf {G}}{\mathrm d}{\mathrm v}\left( {\varvec{\varSigma }}_{l}\right) . \end{aligned}$$
(22)

Inserting Eqs. (36), (21) and (22) in Eq. (20) and taking the second order derivatives leads to

$$\begin{aligned} \frac{\partial ^2l_i({\varvec{\theta }})}{\partial \varvec{\pi }\partial \varvec{\pi }'}&=-\bar{{\mathbf {a}}}_i\bar{{\mathbf {a}}}'_i, \\ \frac{\partial ^2l_i({\varvec{\theta }})}{\partial \varvec{\pi }\partial \varvec{\gamma }'_{k}}&=\alpha _{ki}\left( {\mathbf {a}}_k-\bar{{\mathbf {a}}}_i\right) {\mathbf {b}}'_{ki} \ \forall k,\\ \frac{\partial ^2l_i({\varvec{\theta }})}{\partial \varvec{\pi }\partial \left[ {\mathrm {vec}}(\varvec{\varPi }'_{k})\right] '}&=\alpha _{ki}\left( {\mathbf {a}}_k-\bar{{\mathbf {a}}}_i\right) \left[ {\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{ki}\right) \right] ' \ \forall k,\\ \frac{\partial ^2l_i({\varvec{\theta }})}{\partial \varvec{\pi }\partial \left[ {\mathrm v}({\varvec{\varSigma }}_{k})\right] '}&=-\frac{1}{2}\alpha _{ki}\left( {\mathbf {a}}_k-\bar{{\mathbf {a}}}_i\right) \left[ {\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) \right] '{\mathbf {G}} \ \forall k,\\ \frac{\partial ^2l_i({\varvec{\theta }})}{\partial \varvec{\gamma }_{k}\partial \varvec{\gamma }'_{k}}&= -\alpha _{ki}\left[ {\varvec{\varSigma }}_{k}^{-1}-\left( 1-\alpha _{ki}\right) {\mathbf {b}}_{ki}{\mathbf {b}}'_{ki}\right] \ \forall k,\\ \frac{\partial ^2l_i({\varvec{\theta }})}{\partial \varvec{\gamma }_{k}\partial \varvec{\gamma }'_{l}}&=-\alpha _{ki}\alpha _{li}{\mathbf {b}}_{ki}{\mathbf {b}}'_{li} \ \forall k\ne l, \\ \frac{\partial ^2l_i({\varvec{\theta }})}{\partial \varvec{\gamma }_{k}\partial \left[ {\mathrm {vec}}(\varvec{\varPi }'_{k}\right] '}&=-\alpha _{ki}\left[ {\varvec{\varSigma }}_{k}^{-1}\otimes {\mathbf {x}}'_{i}-\left( 1-\alpha _{ki}\right) {\mathbf {b}}_{ki}\left[ {\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{ki}\right) \right] '\right] \ \forall k,\\ \frac{\partial ^2l_i({\varvec{\theta }})}{\partial \varvec{\gamma }_{k}\partial \left[ {\mathrm {vec}}(\varvec{\varPi }'_{l})\right] '}&=-\alpha _{ki}\alpha _{li}{\mathbf {b}}_{ki}\left[ {\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{li}\right) \right] ' \ \forall k\ne l,\\ \frac{\partial ^2l_i({\varvec{\theta }})}{\partial \varvec{\gamma }_{k}\partial \left[ {\mathrm v}({\varvec{\varSigma }}_{k})\right] '}&=-\alpha _{ki}\left[ \left( {\mathbf {b}}'_{ki}\otimes {\varvec{\varSigma }}_{k}^{-1}\right) +\frac{1}{2}\left( 1-\alpha _{ki}\right) {\mathbf {b}}_{ki}\left[ {\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) \right] '\right] {\mathbf {G}} \ \forall k,\\ \frac{\partial ^2l_i({\varvec{\theta }})}{\partial \varvec{\gamma }_{k}\partial \left[ {\mathrm v}({\varvec{\varSigma }}_{l})\right] '}&=\frac{1}{2}\alpha _{ki}\alpha _{li}{\mathbf {b}}_{ki}\left[ {\mathrm {vec}}\left( {\mathbf {B}}_{li}\right) \right] '{\mathbf {G}} \ \forall k\ne l,\\ \frac{\partial ^2l_i({\varvec{\theta }})}{\partial {\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) \partial \left[ {\mathrm {vec}}(\varvec{\varPi }'_{l})\right] '}&=-\alpha _{ki}\alpha _{li}{\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{ki}\right) \left[ {\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{li}\right) \right] ' \ \forall k\ne l, \\ \frac{\partial ^2l_i({\varvec{\theta }})}{\partial {\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) \partial \left[ {\mathrm {vec}}(\varvec{\varPi }'_{k})\right] '}&=-\alpha _{ki}\left[ \left( {\varvec{\varSigma }}_{k}^{-1}\otimes ({\mathbf {x}}_{i}{\mathbf {x}}_{i}^{\top })\right) +\right. \\&\quad \left. -\left( 1-\alpha _{ki}\right) {\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}_{ki}^{\top }\right) {\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}_{ki}^{\top }\right) ^{\top }\right] \ \forall k,\\ \frac{\partial ^2l_i({\varvec{\theta }})}{\partial {\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) \partial \left[ {\mathrm v}({\varvec{\varSigma }}_{k})\right] '}&= -\alpha _{ki}\left[ \left( {\varvec{\varSigma }}_{k}^{-1}\otimes ({\mathbf {x}}_{i}{\mathbf {b}}'_{ki})\right) \right. \\&\quad \left. +\frac{1}{2}\left( 1-\alpha _{ki}\right) {\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{ki}\right) \left[ {\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) \right] '\right] {\mathbf {G}} \ \forall k,\\ \frac{\partial ^2l_i({\varvec{\theta }})}{\partial {\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) \partial \left[ {\mathrm v}({\varvec{\varSigma }}_{l})\right] '}&=\frac{1}{2}\alpha _{ki}\alpha _{li}{\mathrm {vec}}\left( {\mathbf {x}}_i{\mathbf {b}}'_{ki}\right) \left[ {\mathrm {vec}}\left( {\mathbf {B}}_{li}\right) \right] '{\mathbf {G}} \ \forall k\ne l,\\ \frac{\partial ^2l_i({\varvec{\theta }})}{\partial {\mathrm v}({\varvec{\varSigma }}_{k})\partial \left[ {\mathrm v}({\varvec{\varSigma }}_{k})\right] '}&=-\frac{1}{2}\alpha _{ki}{\mathbf {G}}'\left[ \left( {\varvec{\varSigma }}_{k}^{-1}-2{\mathbf {B}}_{ki}\right) '\otimes {\varvec{\varSigma }}_{k}^{-1}+\right. \\&\quad \left. -\frac{1}{2}\left( 1-\alpha _{ki}\right) {\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) \left[ {\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) \right] '\right] {\mathbf {G}} \ \forall k,\\ \frac{\partial ^2l_i({\varvec{\theta }})}{\partial {\mathrm v}({\varvec{\varSigma }}_{k})\partial \left[ {\mathrm v}({\varvec{\varSigma }}_{l})\right] '}&=-\frac{1}{4}\alpha _{ki}\alpha _{li}{\mathbf {G}}'{\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) \left[ {\mathrm {vec}}\left( {\mathbf {B}}_{li}\right) \right] '{\mathbf {G}} \ \forall k\ne l. \end{aligned}$$

Summing the contributions for the I observations completes the proof.

Appendix 3

3.1 First order differential of \(\log f_{ki}\)

Up to an additive constant, \(\log f_{ki}\) in Eq. (18) is equal to

$$\begin{aligned} \log \pi _{k}-\frac{1}{2}\log \det \left( {\varvec{\varSigma }}_{k}\right) -\frac{1}{2}\mathrm {tr}\left( {\varvec{\varSigma }}_{k}^{-1}\left( \varvec{y}_{i}-\varvec{\gamma }_{k}-\varvec{\varPi }_{k}\varvec{x}_{i}\right) \left( \varvec{y}_{i}-\varvec{\gamma }_{k}-\varvec{\varPi }_{k}\varvec{x}_{i}\right) '\right) . \end{aligned}$$

Thus, its first order differential results to be equal to

$$\begin{aligned} {\mathrm d}{\log f_{ki}}={\mathrm d}_{k0}+{\mathrm d}_{ki1}+{\mathrm d}_{ki2}+{\mathrm d}_{ki3}, \end{aligned}$$
(23)

where

$$\begin{aligned} {\mathrm d}_{k0}&= {\mathrm d}{\log \pi _{k}}=\left( {\mathrm d}\varvec{\pi }\right) '\varvec{a}_{k},\\ {\mathrm d}_{ki1}&=-\frac{1}{2}{\mathrm d}\left( \log \det \left( {\varvec{\varSigma }}_{k}\right) \right) ,\\ {\mathrm d}_{ki2}&=-\frac{1}{2}\mathrm {tr}\left[ {\mathrm d}\left( {\varvec{\varSigma }}_{k}^{-1}\right) \left( {\mathbf {y}}_{i}-\varvec{\gamma }_{k}-\varvec{\varPi }_{k}{\mathbf {x}}_{i}\right) \left( {\mathbf {y}}_{i}-\varvec{\gamma }_{k}-\varvec{\varPi }_{k}{\mathbf {x}}_{i}\right) '\right] ,\\ {\mathrm d}_{ki3}&=-\frac{1}{2}\mathrm {tr}\left[ {\varvec{\varSigma }}_{k}^{-1}{\mathrm d}\left( {\mathbf {y}}_{i}-\varvec{\gamma }_{k}-\varvec{\varPi }_{k}{\mathbf {x}}_{i}\right) \left( {\mathbf {y}}_{i}-\varvec{\gamma }_{k}-\varvec{\varPi }_{k}{\mathbf {x}}_{i}\right) '\right] . \end{aligned}$$
(24)

Using Corollary 9.1.1 and Theorem 1.3 in Schott (2005), it results that

$$\begin{aligned} {\mathrm d}_{ki1} =-\frac{1}{2}\mathrm {tr}\left[ \left( {\mathrm d}{\varvec{\varSigma }}_{k}\right) {\varvec{\varSigma }}_{k}^{-1}\right] . \end{aligned}$$
(25)

Furthermore, since \({\mathrm d}({\varvec{\varSigma }}_{k}^{-1})=-{\varvec{\varSigma }}_{k}^{-1}{\mathrm d}({\varvec{\varSigma }}_{k}){\varvec{\varSigma }}_{k}^{-1}\) (see, e.g., Magnus and Neudecker 1988, p.183), it is possible to write

$$\begin{aligned} {\mathrm d}_{ki2}=\frac{1}{2}\mathrm {tr}\left[ \left( {\mathrm d}{\varvec{\varSigma }}_{k}\right) {\mathbf {b}}_{ki}{\mathbf {b}}_{ki}^{\top }\right] . \end{aligned}$$
(26)

By exploiting Theorem 8.10 in Schott (2005), some results for the differential of matrix functions (see, e.g., Magnus and Neudecker 1988, p.182) and some properties of the vector operator (see, e.g.,Schott 2005, pages 313 and 356), we find

$$\begin{aligned} {\mathrm d}_{ki1}+{\mathrm d}_{ki2}= -\frac{1}{2}\left[ {\mathrm d}{\mathrm v}({\varvec{\varSigma }}_{k})\right] '{\mathbf {G}}'{\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) , \end{aligned}$$
(27)
$$\begin{aligned} {\mathrm d}_{ki3}= \left( {\mathrm d}\varvec{\gamma }_{k}\right) '{\mathbf {b}}_{ki}+\left[ {\mathrm d}{\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) \right] '{\mathrm {vec}}\left( {\mathbf {x}}_{i}{\mathbf {b}}'_{ki}\right) . \end{aligned}$$
(28)

Substituting Eqs. (24), (27) and (28) in (23) leads to

$$\begin{aligned} {\mathrm d}{\log f_{ki}}&=\left( {\mathrm d}\varvec{\pi }\right) '{\mathbf {a}}_{k}+\left( {\mathrm d}\varvec{\gamma }_{k}\right) '{\mathbf {b}}_{ki} +\left[ {\mathrm d}{\mathrm {vec}}(\varvec{\varPi }'_{k})\right] '{\mathrm {vec}}\left( {\mathbf {x}}_{i}{\mathbf {b}}'_{ki}\right) + \\&\quad -\frac{1}{2}\left[ {\mathrm d}{\mathrm v}({\varvec{\varSigma }}_{k})\right] '{\mathbf {G}}'{\mathrm {vec}}\left( {\mathbf {B}}_{ki}\right) . \end{aligned}$$

Appendix 4

4.1 Second order differential of \(\log f_{ki}\)

Using Eq. (6), the second order differential of \(\log f_{ki}\) can be expressed as

$$\begin{aligned} {\mathrm d}^2{\log f_{ki}}={\mathrm d}^2{\log \varvec{\pi }_{k}}+{\mathrm d}^2{\log \phi \left( {\mathbf {y}}_{i};\varvec{\gamma }_{k}+\varvec{\varPi }_{k}{\mathbf {x}}_{i},{\varvec{\varSigma }}_{k}\right) }, \end{aligned}$$
(29)

where

$$\begin{aligned} {\mathrm d}^2{\log \pi _{k}}&= -\left( {\mathrm d}\varvec{\pi }\right) '{\mathbf {a}}_{k}{\mathbf {a}}'_{k}{\mathrm d}\varvec{\pi },\\ {\mathrm d}^2{\log \phi \left( {\mathbf {y}}_{i};\varvec{\gamma }_{k}+\varvec{\varPi }_{k}{\mathbf {x}}_{i},{\varvec{\varSigma }}_{k}\right) }&= {\mathrm d}({\mathrm d}_{ki1}) + {\mathrm d}({\mathrm d}_{ki2}) + {\mathrm d}({\mathrm d}_{ki3}), \end{aligned}$$
(30)

and \({\mathrm d}_{ki1}\), \({\mathrm d}_{ki2}\) and \({\mathrm d}_{ki3}\) are defined in Eqs. (25), (26), and (28), respectively. Thus, it is possible to write

$$\begin{aligned} {\mathrm d}\left( {\mathrm d}_{ki1}\right) =-\frac{1}{2}\mathrm {tr}\left[ \left( {\mathrm d}{\varvec{\varSigma }}_{k}\right) \left( {\mathrm d}{\varvec{\varSigma }}_{k}^{-1}\right) \right] =\frac{1}{2}\mathrm {tr}\left[ \left( {\mathrm d}{\varvec{\varSigma }}_{k}\right) {\varvec{\varSigma }}_{k}^{-1}\left( {\mathrm d}{\varvec{\varSigma }}_{k}\right) {\varvec{\varSigma }}_{k}^{-1}\right] . \end{aligned}$$
(31)

where the last equation holds because of the rule for the differential of the inverse of a nonsingular matrix (see, e.g., Magnus and Neudecker 1988, page 183). Furthermore,

$$\begin{aligned} {\mathrm d}\left( {\mathrm d}_{ki2}\right)&= {\mathrm d}\left( \frac{1}{2}\mathrm {tr}\left[ {\varvec{\varSigma }}_{k}^{-1}\left( {\mathrm d}{\varvec{\varSigma }}_{k}\right) {\varvec{\varSigma }}_{k}^{-1}\left( {\mathbf {y}}_{i}-\varvec{\gamma }_{k}-\varvec{\varPi }_{k}{\mathbf {x}}_{i}\right) \left( {\mathbf {y}}_{i}-\varvec{\gamma }_{k}-\varvec{\varPi }_{k}{\mathbf {x}}_{i}\right) '\right] \right) \\ & = \frac{1}{2}\mathrm {tr}\left[ {\mathrm d}\left( {\varvec{\varSigma }}_{k}^{-1}\right) \left( {\mathrm d}{\varvec{\varSigma }}_{k}\right) {\varvec{\varSigma }}_{k}^{-1}\left( {\mathbf {y}}_{i}-\varvec{\gamma }_{k}-\varvec{\varPi }_{k}{\mathbf {x}}_{i}\right) \left( {\mathbf {y}}_{i}-\varvec{\gamma }_{k}-\varvec{\varPi }_{k}{\mathbf {x}}_{i}\right) '\right] +\\ &\quad+\frac{1}{2}\mathrm {tr}\left[ {\varvec{\varSigma }}_{k}^{-1}\left( {\mathrm d}{\varvec{\varSigma }}_{k}\right) {\mathrm d}\left( {\varvec{\varSigma }}_{k}^{-1}\right) \left( {\mathbf {y}}_{i}-\varvec{\gamma }_{k}-\varvec{\varPi }_{k}{\mathbf {x}}_{i}\right) \left( {\mathbf {y}}_{i}-\varvec{\gamma }_{k}-\varvec{\varPi }_{k}{\mathbf {x}}_{i}\right) '\right] +\\ &\quad+\frac{1}{2}\mathrm {tr}\left[ {\varvec{\varSigma }}_{k}^{-1}\left( {\mathrm d}{\varvec{\varSigma }}_{k}\right) {\varvec{\varSigma }}_{k}^{-1}{\mathrm d}\left( {\mathbf {y}}_{i}-\varvec{\gamma }_{k}-\varvec{\varPi }_{k}{\mathbf {x}}_{i}\right) \left( {\mathbf {y}}_{i}-\varvec{\gamma }_{k}-\varvec{\varPi }_{k}{\mathbf {x}}_{i}\right) '\right] \\ & = -\mathrm {tr}\left[ \left( {\mathrm d}{\varvec{\varSigma }}_{k}\right) {\varvec{\varSigma }}_{k}^{-1}\left( {\mathrm d}{\varvec{\varSigma }}_{k}\right) {\varvec{\varSigma }}_{k}^{-1}\left( {\mathbf {y}}_{i}-\varvec{\gamma }_{k}-\varvec{\varPi }_{k}{\mathbf {x}}_{i}\right) \left( {\mathbf {y}}_{i}-\varvec{\gamma }_{k}-\varvec{\varPi }_{k}{\mathbf {x}}_{i}\right) '{\varvec{\varSigma }}_{k}^{-1}\right] +\\ &\quad-\left( {\mathrm d}\varvec{\gamma }_{k}\right) '\left( {\mathbf {b}}'_{ki}\otimes {\varvec{\varSigma }}_{k}^{-1}\right) {\mathrm d}{\mathrm {vec}}\left( {\varvec{\varSigma }}_{k}\right) +\\&\quad-\left[ {\mathrm d}{\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) \right] '\left[ {\varvec{\varSigma }}_{k}^{-1}\otimes \left( {\mathbf {x}}_{i}{\mathbf {b}}'_{ki}\right) \right] {\mathrm d}{\mathrm {vec}}\left( {\varvec{\varSigma }}_{k}\right) , \end{aligned}$$
(32)

where the last equation is obtained using some properties of the trace and vec operators (see, e.g., Schott 2005, Theorems 8.9,8.10 and 8.11). As far as \({\mathrm d}({\mathrm d}_{ki3})\) is concerned, since

$$\begin{aligned} {\mathrm d}\left( {\mathrm d}_{ki3}\right) =\left( {\mathrm d}\varvec{\gamma }_{k}\right) '{\mathrm d}{\mathbf {b}}_{ki}+ \left[ {\mathrm d}{\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) \right] {\mathrm d}{\mathrm {vec}}\left( {\mathbf {x}}_{i}{\mathbf {b}}'_{ki}\right) , \end{aligned}$$
(33)

an expression for \({\mathrm d}{\mathbf {b}}_{ki}\) is required. This results to be

$$\begin{aligned} {\mathrm d}\left( {\mathbf {b}}_{ki}\right) = -{\varvec{\varSigma }}_{k}^{-1}{\mathrm d}\left( {\varvec{\varSigma }}_{k}\right) {\mathbf {b}}_{ki}-{\varvec{\varSigma }}_{k}^{-1}\left( {\mathrm d}\varvec{\gamma }_{k}\right) -{\varvec{\varSigma }}_{k}^{-1}\left( {\mathrm d}\varvec{\varPi }_{k}\right) {\mathbf {x}}_{i}. \end{aligned}$$
(34)

Substituting Eq. (34) in (33) and using some properties of the vec operator and the Kronecker product leads to the following result:

$$\begin{aligned} {\mathrm d}\left( {\mathrm d}_{ki3}\right)&= -\left[ {\mathrm d}{\mathrm v}\left( {\varvec{\varSigma }}_{k}\right) \right] '{\mathbf {G}}'\left( {\mathbf {b}}_{ki}\otimes {\varvec{\varSigma }}_{k}^{-1}\right) {\mathrm d}\varvec{\gamma }_{k}-\left( {\mathrm d}\varvec{\gamma }_{k}\right) '{\varvec{\varSigma }}_{k}^{-1}{\mathrm d}\varvec{\gamma }_{k}+ \\&\quad-\left( {\mathrm d}\varvec{\gamma }_{k}\right) '\left[ {\varvec{\varSigma }}_{k}^{-1}\otimes {\mathbf {x}}'_{i}\right] {\mathrm d}{\mathrm {vec}}\left( \varvec{\varPi }_{k}'\right) \\&\quad-\left[ {\mathrm d}{\mathrm v}\left( {\varvec{\varSigma }}_{k}\right) \right] '\varvec{G}'\left[ {\varvec{\varSigma }}_{k}^{-1}\otimes \left( {\mathbf {b}}_{ki}{\mathbf {x}}'_{ki}\right) \right] {\mathrm d}{\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) + \\&\quad-\left[ {\mathrm d}{\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) \right] '\left( {\varvec{\varSigma }}_{k}^{-1}\otimes {\mathbf {x}}_{i}\right) {\mathrm d}\varvec{\gamma }_{k} \\&\quad-\left[ {\mathrm d}{\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) \right] '\left[ {\varvec{\varSigma }}_{k}^{-1}\otimes \left( {\mathbf {x}}_{i}{\mathbf {x}}'_{i}\right) \right] {\mathrm d}{\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) . \end{aligned}$$
(35)

By inserting Eqs. (30), (31), (32) and (35) in (29) and after some algebra, the following expression for the second order differential of \(\log f_{ki}\) is obtained:

$$\begin{aligned} {\mathrm d}^2{\log f_{ki}}&= -\left( {\mathrm d}\varvec{\pi }\right) '{\mathbf {a}}_{k}{\mathbf {a}}'_{k}{\mathrm d}\varvec{\pi }-\frac{1}{2}\left[ {\mathrm d}{\mathrm v}\left( {\varvec{\varSigma }}_{k}\right) \right] '{\mathbf {G}}'\left[ \left( {\varvec{\varSigma }}_{k}^{-1}-2{\mathbf {B}}_{ki}\right) '\otimes {\varvec{\varSigma }}_{k}^{-1}\right] {\mathbf {G}}{\mathrm d}{\mathrm v}\left( {\varvec{\varSigma }}_{k}\right) + \\&\quad-\left( {\mathrm d}\varvec{\gamma }_{k}\right) '\left( {\mathbf {b}}'_{ki}\otimes {\varvec{\varSigma }}_{k}^{-1}\right) {\mathbf {G}}{\mathrm d}{\mathrm v}\left( {\varvec{\varSigma }}_{k}\right) -\left[ {\mathrm d}{\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) \right] '\left[ {\varvec{\varSigma }}_{k}^{-1}\otimes \left( {\mathbf {x}}_{i}{\mathbf {b}}'_{ki}\right) \right] {\mathbf {G}}{\mathrm d}{\mathrm v}\left( {\varvec{\varSigma }}_{k}\right) + \\&\quad- \left[ {\mathrm d}{\mathrm v}\left( {\varvec{\varSigma }}_{k}\right) \right] '{\mathbf {G}}'\left( {\mathbf {b}}_{ki}\otimes {\varvec{\varSigma }}_{k}^{-1}\right) {\mathrm d}\varvec{\gamma }_{k} \\&\quad-\left( {\mathrm d}\varvec{\gamma }_{k}\right) '{\varvec{\varSigma }}_{k}^{-1}{\mathrm d}\varvec{\gamma }_{k}+ \\&\quad- \left( {\mathrm d}\varvec{\gamma }_{k}\right) '\left[ {\varvec{\varSigma }}_{k}^{-1}\otimes {\mathbf {x}}'_{i}\right] {\mathrm d}{\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) \\&\quad-\left[ {\mathrm d}{\mathrm v}\left( {\varvec{\varSigma }}_{k}\right) \right] '\varvec{G}'\left[ {\varvec{\varSigma }}_{k}^{-1}\otimes \left( {\mathbf {b}}_{ki}{\mathbf {x}}'_{ki}\right) \right] {\mathrm d}{\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) + \\&\quad- \left[ {\mathrm d}{\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) \right] '\left( {\varvec{\varSigma }}_{k}^{-1}\otimes {\mathbf {x}}_{i}\right) {\mathrm d}\varvec{\gamma }_{k}+ \\&\quad- \left[ {\mathrm d}{\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) \right] '\left[ {\varvec{\varSigma }}_{k}^{-1}\otimes \left( {\mathbf {x}}_{i}{\mathbf {x}}'_{i}\right) \right] {\mathrm d}{\mathrm {vec}}\left( \varvec{\varPi }'_{k}\right) . \end{aligned}$$
(36)

Appendix 5

5.1 Proof of Proposition 1

The results given in parts (a), (b), (c) and (d) follow immediately from the Theorems 2.1, 2.2, 3.2 and 3.3 of White (1982), respectively.

Appendix 6

6.1 Proof of Theorem 3

Let

$$\begin{aligned} {\mathbf{C}} _I({\varvec{\psi }}) = I \cdot \left( H({\varvec{\psi }})\right) ^{-1} \left( \sum _{i=1}^I s_i({\varvec{\psi }})s_i({\varvec{\psi }})' \right) \left( H({\varvec{\psi }})\right) ^{-1}. \end{aligned}$$

According to the model properties, matrices \(H({\varvec{\psi }})\), \({\mathbb {E}}(H_i({\varvec{\psi }}))\), \({\mathbf{C}} ({\varvec{\psi }})\) and \({\mathbf{C}} _I({\varvec{\psi }})\) have a block-diagonal structure. Specifically:

$$\begin{aligned} {\mathbb {E}}(H_i({\varvec{\psi }}))& = \left[ \begin{array}{cc} {\mathbb {E}}(H_i({\varvec{\vartheta }})) &{} {\mathbf 0} \\ {\mathbf 0} &{} {\mathbb {E}}(H_i({\varvec{\theta }})) \\ \end{array} \right] ,\\ {\mathbf{C}} ({\varvec{\psi }})&= \left[ \begin{array}{cc} {\mathbf{C}} ({\varvec{\vartheta }}) &{} {\mathbf 0} \\ {\mathbf 0} &{} {\mathbf{C}} ({\varvec{\theta }})\\ \end{array} \right] ,\\ {\mathbf{C}} _I({\varvec{\psi }})&= \left[ \begin{array}{cc} {\mathbf{C}} _I({\varvec{\vartheta }}) &{} {\mathbf 0} \\ {\mathbf 0} &{} {\mathbf{C}} _I({\varvec{\theta }}) \\ \end{array} \right] , \end{aligned}$$

where

$$\begin{aligned} {\mathbf{C}} _I({\varvec{\vartheta }})&= I \cdot \left( H({\varvec{\vartheta }})\right) ^{-1} \left( \sum _{i=1}^I s_i({\varvec{\vartheta }})s_i({\varvec{\vartheta }})' \right) \left( H({\varvec{\vartheta }})\right) ^{-1}, \\ {\mathbf{C}} _I({\varvec{\theta }})&= I \cdot \left( H({\varvec{\theta }})\right) ^{-1} \left( \sum _{i=1}^I s_i({\varvec{\theta }})s_i({\varvec{\theta }})' \right) \left( H({\varvec{\theta }})\right) ^{-1}, \\ {\mathbf{C}} ({\varvec{\vartheta }})&= \left( {\mathbb {E}}(H_i({\varvec{\vartheta }}))\right) ^{-1} {\mathbb {E}} \left( s_i({\varvec{\vartheta }})s_i({\varvec{\vartheta }})' \right) \left( {\mathbb {E}}(H_i({\varvec{\vartheta }})) \right) ^{-1},\\ {\mathbf{C}} ({\varvec{\theta }})&= \left( {\mathbb {E}}(H_i({\varvec{\theta }})) \right) ^{-1} {\mathbb {E}} \left( s_i({\varvec{\theta }})s_i({\varvec{\theta }})' \right) \left( {\mathbb {E}}(H_i({\varvec{\theta }})) \right) ^{-1}, \end{aligned}$$

with \(s_i({\varvec{\vartheta }})=\frac{\partial l_i({\varvec{\vartheta }})}{\partial {\varvec{\vartheta }}}\) and \(s_i({\varvec{\theta }})=\frac{\partial l_i({\varvec{\theta }})}{\partial {\varvec{\theta }}}\).

Thus, the result given in Eq. (16) follows from Eqs. (12), (14) and (15).

Appendix 7

7.1 Proof of Theorem 4

Since the matrices \(H({\varvec{\psi }})\), \({\mathbb {E}}(H_i({\varvec{\psi }}))\), \({\mathbf{C}} ({\varvec{\psi }})\) and \({\mathbf{C}} _I({\varvec{\psi }})\) have a block-diagonal structure, the result in Eq. (17) follows immediately from Eqs. (12) and (13).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Galimberti, G., Nuzzi, L. & Soffritti, G. Covariance matrix estimation of the maximum likelihood estimator in multivariate clusterwise linear regression. Stat Methods Appl 30, 235–268 (2021). https://doi.org/10.1007/s10260-020-00523-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-020-00523-9

Keywords

Mathematics Subject Classification

Navigation