Skip to main content
Log in

Clustering multivariate functional data in group-specific functional subspaces

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

With the emergence of numerical sensors in many aspects of everyday life, there is an increasing need in analyzing multivariate functional data. This work focuses on the clustering of such functional data, in order to ease their modeling and understanding. To this end, a novel clustering technique for multivariate functional data is presented. This method is based on a functional latent mixture model which fits the data into group-specific functional subspaces through a multivariate functional principal component analysis. A family of parsimonious models is obtained by constraining model parameters within and between groups. An Expectation Maximization algorithm is proposed for model inference and the choice of hyper-parameters is addressed through model selection. Numerical experiments on simulated datasets highlight the good performance of the proposed methodology compared to existing works. This algorithm is then applied to the analysis of the pollution in French cities for 1 year.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://www.who.int/airpollution/en/.

  2. https://www.airpaca.org/donnees/telecharger, https://www.atmo-auvergnerhonealpes.fr/donnees/telech, https://www.atmo-nouvelleaquitaine.org/donnees/telecharger.

References

  • Akaike H (1974) A new look at the statistical model identification. IEEE Tran Autom Control 9:716–723

    MathSciNet  MATH  Google Scholar 

  • Basso RM, Lachos VH, Cabral CRB, Ghosh P (2010) Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput Stat Data Anal 54(12):2926–2941

    MathSciNet  MATH  Google Scholar 

  • Berrendero J, Justel A, Svarc M (2011) Principal components for multivariate functional data. Comput Stat Data Anal 55:2619–263

    MathSciNet  MATH  Google Scholar 

  • Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans PAMI 22:719–725

    Google Scholar 

  • Birge L, Massart P (2007) Minimal penalties for Gaussian model selection. Probab Theory Relat Fields 138:33–73

    MathSciNet  MATH  Google Scholar 

  • Bongiorno EG, Goia A (2016) Classification methods for hilbert data based on surrogate density. Comput Stat Data Anal 99(C):204–222

    MathSciNet  MATH  Google Scholar 

  • Bouveyron C, Jacques J (2011) Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif 5(4):281–300

    MathSciNet  MATH  Google Scholar 

  • Bouveyron C, Come E, Jacques J (2015) The discriminative functional mixture model for the analysis of bike sharing systems. Ann Appl Stat 9(4):1726–1760

    MathSciNet  MATH  Google Scholar 

  • Bouveyron C, Celeux G, Murphy T, Raftery A (2019) Model-based clustering and classification for data science: with applications in R. Statistical and probabilistic mathematics. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Byers S, Raftery AE (1998) Nearest-neighbor clutter removal for estimating features in spatial point processes. J Am Stat Assoc 93(442):577–584

    MATH  Google Scholar 

  • Cattell R (1966) The scree test for the number of factors. Multivar Behav Res 1(2):245–276

    Google Scholar 

  • Chen L, Jiang C (2016) Multi-dimensional functional principal component analysis. Stat Comput 27:1181–1192

    MathSciNet  MATH  Google Scholar 

  • Chiou J, Chen Y, Yang Y (2014) Multivariate functional principal component analysis: a normalization approach. Stat Sin 24:1571–1596

    MathSciNet  MATH  Google Scholar 

  • Chiou JM, Li PL (2007) Functional clustering and identifying substructures of longitudinal data. J R Stat Soc Ser B Stat Methodol 69(4):679–699

    MathSciNet  Google Scholar 

  • Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  • Ferraty F, Vieu P (2003) Curves discrimination: a nonparametric approach. Comput Stat Data Anal 44:161–173

    MathSciNet  MATH  Google Scholar 

  • Gallegos MT, Ritter G (2005) A robust method for cluster analysis. Ann Stat 33(1):347–380

    MathSciNet  MATH  Google Scholar 

  • Gallegos MT, Ritter G (2009) Trimming algorithms for clustering contaminated grouped data and their robustness. Adv Data Anal Classif 3:135–167

    MathSciNet  MATH  Google Scholar 

  • Hennig C, Coretto P (2007) The noise component in model-based cluster analysis. Springer, Berlin, pp 127–138

    Google Scholar 

  • Ieva F, Paganoni AM (2016) Risk prediction for myocardial infarction via generalized functional regression models. Stat Methods Med Res 25:1648–1660

    MathSciNet  Google Scholar 

  • Ieva F, Paganoni A, Pigoli D, Vitelli V (2013) Multivariate functional clustering for the morphological analysis of ECG curves. J R Stat Soc Series C (Appl Stat) 62(3):401–418

    MathSciNet  Google Scholar 

  • Jacques J, Preda C (2013) Funclust: a curves clustering method using functional random variable density approximation. Neurocomputing 112:164–171

    Google Scholar 

  • Jacques J, Preda C (2014a) Functional data clustering: a survey. Adv Data Anal Classif 8(3):231–255

    MathSciNet  MATH  Google Scholar 

  • Jacques J, Preda C (2014b) Model based clustering for multivariate functional data. Comput Stat Data Anal 71:92–106

    MathSciNet  MATH  Google Scholar 

  • James G, Sugar C (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462):397–408

    MathSciNet  MATH  Google Scholar 

  • Kayano M, Dozono K, Konishi S (2010) Functional cluster analysis via orthonormalized Gaussian basis expansions and its application. J Classif 27:211–230

    MathSciNet  MATH  Google Scholar 

  • Petersen KB, Pedersen MS (2012) The matrix cookbook. http://www2.imm.dtu.dk/pubdb/p.php?3274, version 20121115

  • Preda C (2007) Regression models for functional data by reproducing kernel hilbert spaces methods. J Stat Plan Inference 137:829–840

    MathSciNet  MATH  Google Scholar 

  • R Core Team (2017) R: a language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria, https://www.R-project.org/

  • Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer series in statistics. Springer, New York

    MATH  Google Scholar 

  • Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850

    Google Scholar 

  • Saporta G (1981) Méthodes exploratoires d’analyse de données temporelles. Cahiers du Bureau universitaire de recherche opérationnelle Série Recherche 37–38:7–194

    Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    MathSciNet  MATH  Google Scholar 

  • Singhal A, Seborg D (2005) Clustering multivariate time-series data. J Chemom 19:427–438

    Google Scholar 

  • Tarpey T, Kinateder K (2003) Clustering functional data. J Classif 20(1):93–114

    MathSciNet  MATH  Google Scholar 

  • Tokushige S, Yadohisa H, Inada K (2007) Crisp and fuzzy k-means clustering algorithms for multivariate functional data. Comput Stat 22:1–16

    MathSciNet  MATH  Google Scholar 

  • Traore OI, Cristini P, Favretto-Cristini N, Pantera L, Vieu P, Viguier-Pla S (2019) Clustering acoustic emission signals by mixing two stages dimension reduction and nonparametric approaches. Comput Stat 34(2):631–652

    MathSciNet  MATH  Google Scholar 

  • Yamamoto M (2012) Clustering of functional data in a low-dimensional subspace. Adv Data Anal Classif 6:219–247

    MathSciNet  MATH  Google Scholar 

  • Yamamoto M, Terada Y (2014) Functional factorial k-means analysis. Comput Stat Data Anal 79:133–148

    MathSciNet  MATH  Google Scholar 

  • Yamamoto M, Hwang H (2017) Dimension-reduced clustering of functional data via subspace separation. J Classif 34:294–326

    MathSciNet  MATH  Google Scholar 

  • Zambom AZ, Collazos JA, Dias R (2019) Functional data clustering via hypothesis testing k-means. Comput Stat 34(2):527–549

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amandine Schmutz.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We would like to thank the LabCom ’CWD-VetLab’ for its financial support. The LabCom ’CWD-VetLab’ is financially supported by the Agence Nationale de la Recherche (Contract ANR 16-LCV2-0002-01).

Appendix: Proofs

Appendix: Proofs

1.1 Proof of Proposition 1

$$\begin{aligned} l(\theta )=\sum _{i=1}^n\sum _{k=1}^Kz_{ki}log(\pi _k f(x_i,\theta _k)), \end{aligned}$$

where \(z_{ki}{=}1\) if \(x_i\) belongs to the cluster k and \(z_{ki}=0\) otherwise. \(f(x_i,\theta _k)\) is a Gaussian density, with parameters \(\theta _k=\{\mu _k,\Sigma _k\}\). So the complete log-likelihood is written:

$$\begin{aligned} l(\theta )= & {} \sum _{i=1}^n\sum _{k=1}^Kz_{ki}log[\pi _k\frac{1}{(2\pi )^{R/2}|\Sigma _k|^{1/2}} exp(\frac{-1}{2}(x_i-\mu _k)^t \Sigma _k^{-1}(x_i-\mu _k))] \\= & {} \sum _{i=1}^n\sum _{k=1}^Kz_{ki}[log(\pi _k)- \frac{1}{2}log|\Sigma _k|-\frac{1}{2}(x_i-\mu _k)^t\Sigma _k^{-1}(x_i-\mu _k)-\frac{R}{2}log(2\pi )]. \end{aligned}$$

For the \([a_{kj}b_kQ_kd_k]\) model, we have:

$$\begin{aligned} l(\theta )= & {} \frac{1}{2}\sum _{i=1}^{n}\sum _{k=1}^Kz_{ki}[-2log(\pi _k)+log(\prod _{j=1}^{d_k}a_{kj}\prod _{j=d_k+1}^Rb_k)+(x_i-\mu _k)^tQ_k\Delta _k^{-1}Q_k^t(x_i-\mu _k)]\\&\quad -\,\frac{nR}{2}log(2\pi ) \\= & {} \frac{1}{2}\sum _{i=1}^{n}\sum _{k=1}^Kz_{ki}[-2log(\pi _k)+\sum _{j=1}^{d_k} log(a_{kj})+\sum _{j=d_k+1}^Rlog(b_k)]\\&\quad +\,(x_i-\mu _k)^tQ_k\Delta _k^{-1}Q_k^t(x_i-\mu _k)]-\frac{nR}{2}log(2\pi ). \end{aligned}$$

Let \(n_k=\sum _{i=1}^nz_{ki}\) be the number of curves within cluster k, the complete log-likelihood is then written:

$$\begin{aligned} l(\theta )= & {} -\frac{1}{2}\sum _{k=1}^Kn_k[-2log(\pi _k)+\sum _{j=1}^{d_k}log(a_{kj})+\sum _{j={d_k+1}}^Rlog(b_k) \\&\quad +\,\frac{1}{n_k}\sum _{i=1}^nz_{kj}(x_i-\mu _k)^tQ_k\Delta _k^{-1}Q_k^t(x_i-\mu _k)]-\frac{nR}{2}log(2\pi ). \end{aligned}$$

The quantity \((x_i-\mu _k)^tQ_k\Delta _k^{-1}Q_k^t(x_i-\mu _k)\) is a scalar, so it is equal to it trace:

$$\begin{aligned} \frac{1}{n_k}\sum _{i=1}^nz_{ki}(x_i-\mu _k)^tQ_k\Delta _k^{-1}Q_k^t(x_i-\mu _k)=\frac{1}{n_k}\sum _{i=1}^nz_{ki}tr((x_i-\mu _k)^tQ_k\Delta _k^{-1}Q_k^t(x_i-\mu _k)). \end{aligned}$$

Well \(tr([(x_i-\mu _k)^tQ_k]\times [\Delta _k^{-1}Q_k^t(x_i-\mu _k)])=tr([\Delta _k^{-1}Q_k^t(x_i-\mu _k)]\times [(x_i-\mu _k)^tQ_k])\), consequently:

$$\begin{aligned} \frac{1}{n_k}\sum _{i=1}^nz_{ki}(x_i-\mu _k)^tQ_k\Delta _k^{-1}Q_k^t(x_i-\mu _k)= & {} \frac{1}{n_k}\sum _{i=1}^nz_{ki}tr(\Delta _k^{-1}Q_k^t(x_i-\mu _k)(x_i-\mu _k)^tQ_k) \\= & {} tr(\Delta _k^{-1}Q_k^t[\frac{1}{n_k}\sum _{i=1}^nz_{ki}(x_i-\mu _k)^t(x_i-\mu _k)]Q_k)\\= & {} tr(\Delta _k^{-1}Q_k^tC_kQ_k), \end{aligned}$$

where \(C_k=\frac{1}{n_k}\sum _{i=1}^{n}z_{ki}(x_i-\mu _k)^t(x_i-\mu _k)\) is the empirical covariance matrix of the kth element of the mixture model. The \(\Delta _k\) matrix is diagonal, so we can write:

$$\begin{aligned} \frac{1}{n_k}\sum _{i=1}^nz_{ki}(x_i-\mu _k)^tQ_k\Delta _k^{-1}Q_k^t(x_i-\mu _k)= & {} \sum _{j=1}^{d_k}\frac{q_{kj}^tW^{1/2}C_kW^{1/2}q_{kj}}{a_{kj}} \\&\quad +\,\sum _{j={d_k+1}}^R\frac{q_{kj}^tW^{1/2}C_kW^{1/2}q_{kj}}{b_k}, \end{aligned}$$

where \(q_{kj}\) is jth column of \(Q_k\). Finally,

$$\begin{aligned} l(\theta )= & {} -\frac{1}{2}\sum _{k=1}^Kn_k[-2log(\pi _k)+\sum _{j=1}^{d_k}log(a_{kj})+\sum _{j={d_k+1}}^Rlog(b_k)+\sum _{j=1}^{d_k}\frac{q_{kj}^tW^{1/2}C_kW^{1/2}q_{kj}}{a_{kj}} \\&\quad +\,\sum _{j={d_k+1}}^R\frac{q_{kj}^tW^{1/2}C_kW^{1/2}q_{kj}}{b_k}]+\frac{nR}{2}log(2\pi ). \end{aligned}$$

1.2 Proof of Proposition 2

$$\begin{aligned} H_k(x)= & {} -2log(\pi _k f(x,\theta _k)) \\= & {} -2log(\pi _k) -2log(f(x,\theta _k)) \\= & {} -2log(\pi _k) - 2log(\frac{1}{(2\pi )^{R/2}|\Sigma _k|^{1/2}} exp(\frac{-1}{2}(x-\mu _k)^t \Sigma _k^{-1}(x-\mu _k)) \\= & {} -2log(\pi _k) - 2log(\frac{1}{(2\pi )^{R/2}|\Sigma _k|^{1/2})}) + (x-\mu _k)^t \Sigma _k^{-1}(x-\mu _k) \\= & {} -2log(\pi _k) + Rlog(2\pi ) +log|\Sigma _k|+(x-\mu _k)^t\Sigma _k^{-1}(x-\mu _k). \end{aligned}$$

But, \(\Sigma _k=Q_k\Delta _k Q_k^t\) and \(Q_k^tQ_k=I_R\), hence:

$$\begin{aligned} H_k(x) = -2log(\pi _k)+Rlog(2\pi )+log|\Sigma _k|+(x_k-\mu _k)^t(Q_k\Delta _k Q_k^t)^{-1}(x_k-\mu _k). \end{aligned}$$

Let \(Q_k=\tilde{Q_k}+\bar{Q_k}\) where \(\tilde{Q_k}\) is the \(R\times R\) matrix containing the \(d_k\) first columns of \(Q_k\) completed by zeros and where \(\bar{Q_k}=Q_k-\tilde{Q_k}\). Notice that \(\tilde{Q_k}\Delta _k^{-1}\bar{Q_k^t}=\bar{Q_k}\Delta _k^{-1}\tilde{Q_k^t}=O_p\) where \(O_p\) is the null matrix. So,

$$\begin{aligned} Q_k\Delta _k^{-1}Q_k^t= (\tilde{Q_k}+\bar{Q_k})\Delta _k^{-1}(\tilde{Q_k}+\bar{Q_k})= \tilde{Q_k}\Delta _k^{-1}\tilde{Q_k}+\bar{Q_k}\Delta _k^{-1}\bar{Q_k}. \end{aligned}$$

Hence,

$$\begin{aligned} H_k(x)= & {} -2log(\pi _k)+Rlog(2\pi )+log|\Sigma _k|+(x-\mu _k)^t\tilde{Q_k}\Delta _k^{-1}\tilde{Q_k^t}(x-\mu _k) \\&\quad +\,(x-\mu _k)^t\bar{Q_k}\Delta _k^{-1}\bar{Q_k^t}(x-\mu _k). \end{aligned}$$

With definitions \(\tilde{Q_k}[\tilde{Q_k^t}\tilde{Q_k}]=\tilde{Q_k}\) and \(\bar{Q_k}[\bar{Q_k^t}\bar{Q_k}]=\bar{Q_k}\), we can rephrase \(H_k(x)\) as:

$$\begin{aligned} H_k(x)= & {} -2log(\pi _k)+Rlog(2\pi )+log|\Sigma _k|+(x-\mu _k)^t\tilde{Q_k}\tilde{Q_k^t}\tilde{Q_k}\Delta _k^{-1}\tilde{Q_k^t}\tilde{Q_k}\tilde{Q_k^t}(x-\mu _k) \\&\quad +\,(x-\mu _k)^t\bar{Q_k}\bar{Q_k^t}\bar{Q_k}\Delta _k^{-1}\bar{Q_k^t}\bar{Q_k}\bar{Q_k^t}(x-\mu _k) \\= & {} -2log(\pi _k)+Rlog(2\pi )+log|\Sigma _k|+[\tilde{Q_k}\tilde{Q_k^t}(x-\mu _k)]^t\tilde{Q_k}\Delta _k^{-1}\tilde{Q_k}^t[\tilde{Q_k}\tilde{Q_k^t}(x-\mu _k)] \\&\quad + \,[\bar{Q_k}\bar{Q_k^t}(x-\mu _k)]^t\bar{Q_k}\Delta _k^{-1}\bar{Q_k}^t[\bar{Q_k}\bar{Q_k^t}(x-\mu _k)]. \end{aligned}$$

We define \(\mathcal {D}_k=\tilde{Q_k}\Delta _k^{-1}\tilde{Q_k^t}\) and the norm \(||.||_{\mathcal {D}_k}\) on \(\mathbb {E}_k\) such as \(||x||_{\mathcal {D}_k}=x^t\mathcal {D}_kx\). So, on one hand:

$$\begin{aligned}{}[\tilde{Q_k}\tilde{Q_k^t}(x-\mu _k)]^t\tilde{Q_k}\Delta _k^{-1}\tilde{Q_k^t}[\tilde{Q_k}\tilde{Q_k^t}(x-\mu _k)]= ||\tilde{Q_k}\tilde{Q_k^t}(x-\mu _k)||_{\mathcal {D}_k}^2. \end{aligned}$$

On the other hand:

$$\begin{aligned}{}[\bar{Q_k}\bar{Q_k^t}(x-\mu _k)]^t\bar{Q_k}\Delta _k^{-1}\bar{Q_k^t}[\bar{Q_k}\bar{Q_k^t}(x-\mu _k)]=\frac{1}{b_k}||\bar{Q_k}\bar{Q_k^t}(x-\mu _k)||^2. \end{aligned}$$

Consequently,

$$\begin{aligned} H_k(x)=-2log(\pi _k)+Rlog(2\pi )+log|\Sigma _k|+||\tilde{Q_k}\tilde{Q_k^t}(x-\mu _k)||_{\mathcal {D}_k}^2+\frac{1}{b_k}||\bar{Q_k}\bar{Q_k^t}(x-\mu _k)||^2. \end{aligned}$$

Knowing \(P_k\), \(P_k^{\bot }\) and \(||\mu _k-P_k^{\bot }||^2=||x-P_k(x)||^2\), we have:

$$\begin{aligned} H_k(x)= ||\mu _k-P_k(x)||^2_{\mathcal {D}_k}+\frac{1}{b_k}||x-P_k(x)||^2+log|\Sigma _k|-2log(\pi _k)+Rlog(2\pi ). \end{aligned}$$

Moreover, \(log|\Sigma _k|=\sum _{j=1}^{d_k}log(a_{kj})+(R-d_k)log(b_k)\). Finally,

$$\begin{aligned} H_k(x)= & {} ||\mu _k-P_k(x)||^2_{\mathcal {D}_k}+\frac{1}{b_k}||x-P_k(x)||^2+\sum _{j=1}^{d_k}log(a_{kj})+(R-d_k)log(b_k)-2log(\pi _k) \\&\quad \,+Rlog(2\pi ). \end{aligned}$$

1.3 Proof of Proposition 3

Parameter\(Q_{k}\) We have to maximise the log-likelihood under the constraint \(q_{kj}^tq_{kj}=1\), which is equivalent to looking for a saddle point of the Lagrange function:

$$\begin{aligned} \mathcal {L}=-2l(\theta )-\sum _{j=1}^R \omega _{kj}(q_{kj}^tq_{kj}-1), \end{aligned}$$

where \(\omega _{kj}\) are Lagrange multiplier. So we can write:

$$\begin{aligned} \mathcal {L}&=\sum _{k=1}^K \eta _k\left[ \sum _{j=1}^{d_k}\left( log(a_{kj})+\frac{q_{kj}^tW^{1/2}C_kW^{1/2}q_{kj}}{a_{kj}}\right) \right. \\&\left. \quad +\,\sum _{j=d_k+1}^R\left( log(b_k)+\frac{q_{kj}^tW^{1/2}C_kW^{1/2}q_{kj}}{b_k}\right) -2log(\pi _k)\right] +\frac{nR}{2}log(2\pi ) \\&\quad -\,\sum _{j=1}^R\omega _{kj}(q_{kj}^tq_{kj}-1). \end{aligned}$$

Therefore, the gradient of \(\mathcal {L}\) in relation to \(q_{kj}\) is:

$$\begin{aligned} \nabla _{q_{kj}}\mathcal {L}&=\nabla _{q_{kj}}\left( \sum _{k=1}^K\eta _k\left[ \sum _{j=1}^{d_k}\frac{q_{kj}^tW^{1/2}C_kW^{1/2}q_{kj}}{a_{kj}}+\sum _{j=d_k+1}^{R}\frac{q_{kj}^tW^{1/2}C_kW^{1/2}q_{kj}}{b_k}\right] \right. \\&\left. \quad - \,\sum _{j=1}^R\omega _{kj}(q_{kj}^tq_{kj}-1)\right) . \end{aligned}$$

As a reminder, when W is symmetric, then \(\frac{\partial }{\partial x}(x-s)^TW(x-s)=2W(x-s)\) and \(\frac{\partial }{\partial x}(x^Tx)=2x\) (cf. Petersen and Pedersen (2012)), so:

$$\begin{aligned} \nabla _{q_{kj}}\mathcal {L}= & {} \eta _k\left[ 2\frac{W^{1/2}C_kW^{1/2}}{\sigma _{kj}}q_{kj}\right] -2\omega _{kj}q_{kj} \end{aligned}$$

where \(\sigma _{kj}\) is the jth diagonal term of matrix \(\Delta _k\).

So,

$$\begin{aligned} q_{kj}^t\nabla _{q_{kj}}\mathcal {L}=0\Leftrightarrow & {} \omega _{kj}q_{kj}=\frac{\eta _k}{\sigma _{kj}}q_{kj}^tW^{1/2}C_kW^{1/2}q_{kj} \\\Leftrightarrow & {} W^{1/2}C_kW^{1/2}q_{kj}=\frac{\omega _{kj}\sigma _{kj}}{\eta _k}q_{kj}. \end{aligned}$$

\(q_{kj}\) is the eigenfunction of \(W^{1/2}C_kW^{1/2}\) which match the eigenvalue \(\lambda _{kj}=\frac{\omega _{kj}\sigma _{kj}}{\eta _k}=W^{1/2}C_kW^{1/2}\). We can write \(q_{kj}^tq_{kl}=0\) if \(j\ne l\). So the log-likelihood can be written:

$$\begin{aligned} -2l(\theta )=\sum _{k=1}^K\eta _k\left[ \sum _{j=1}^{d_k}\left( log(a_{kj})+\frac{\lambda _{kj}}{a_{kj}}\right) +\sum _{j=d_k+1}^R\left( log(b_k)+\frac{\lambda _{kj}}{b_k}\right) \right] +C^{te}, \end{aligned}$$

we substitute the equation \(\sum _{j=d_k+1}^R\lambda _{kj}=tr(W^{1/2}C_kW^{1/2})-\sum _{j=1}^{d_k}\lambda _{kj}\):

$$\begin{aligned} -2l(\theta )= & {} \sum _{k=1}^K\eta _k\left[ \sum _{j=1}^{d_k}log(a_{kj})+\sum _{j=1}^{d_k}\frac{\lambda _{kj}}{a_{kj}}+\sum _{j=d_i+1}^Rlog(b_k)+\frac{1}{b_k}\left( tr(W^{1/2}C_kW^{1/2})-\sum _{j=1}^{d_k}\lambda _{kj}\right) \right] +C^{te} \\= & {} \sum _{k=1}^K\eta _k\left[ \sum _{j=1}^{d_k}log(a_{kj})+\sum _{j=1}^{d_k}\lambda _{kj}\left( \frac{1}{a_{kj}}-\frac{1}{b_k}\right) +\sum _{j=d_i+1}^Rlog(b_k)+\frac{1}{b_k}tr(W^{1/2}C_kW^{1/2})\right] +C^{te} \\= & {} \sum _{k=1}^K\eta _k\left[ \sum _{j=1}^{d_k}log(a_{kj})+\sum _{j=1}^{d_k}\lambda _{kj}\left( \frac{1}{a_{kj}}-\frac{1}{b_k}\right) +(p-d_k)log(b_k)+\frac{tr(W^{1/2}C_kW^{1/2})}{b_k}\right] +C^{te}. \end{aligned}$$

In order to minimize \(-2l(\theta )\) compared to \(q_{kj}\), we minimize the quantity \(\sum _{k=1}^K\eta _k\sum _{j=1}^{d_k}\lambda _{kj}(\frac{1}{a_{kj}}-\frac{1}{b_k})\) compared to \(\lambda _{kj}\). Knowing that \((\frac{1}{a_{kj}}-\frac{1}{b_k})\le 0, \forall j=1,\ldots ,d_k,\)\(\lambda _{kj}\) has to be as high as feasible. So, the jth column \(q_{kj}\) of matrix Q is estimated by the eigenfunction associated to the jth highest eigenvalue of \(W^{1/2}C_kW^{1/2}\).

Parameter\(a_{kj}\) As a reminder \((ln(x))'=\frac{x'}{x}\) and \((\frac{1}{x})'=-\frac{1}{x^2}\). The partial derivative of \(l(\theta )\) in relation to \(a_{kj}\) is:

$$\begin{aligned} -2\frac{\partial l(\theta )}{\partial a_{kj}}=\eta _k\left( \frac{1}{a_{kj}}-\frac{q_{kj}^tW^{1/2}C_kW^{1/2}q_{kj}}{a_{kj}^2}\right) \end{aligned}$$

The condition \(\frac{\partial l(\theta )}{\partial a_{kj}}=0\) is equivalent to:

$$\begin{aligned}&\eta _k\left( \frac{1}{a_{kj}}-\frac{q_{kj}^tW^{1/2}C_kW^{1/2}q_{kj}}{a_{kj}^2}\right) =0 \\&\quad \Leftrightarrow \frac{1}{a_{kj}}=\frac{q_{kj}^tW^{1/2}C_kW^{1/2}q_{kj}}{a_{kj}^2} \\&\quad \Leftrightarrow a_{kj}=q_{kj}^tW^{1/2}C_kW^{1/2}q_{kj} \\&\quad \Leftrightarrow a_{kj}= \lambda _{kj} \end{aligned}$$

Parameter\(b_k\) The partial derivative of \(l(\theta )\) in relation to \(b_{k}\) is:

$$\begin{aligned} -2\frac{\partial l(\theta )}{\partial b_{k}}= & {} \eta _k\sum _{j=d_k+1}^{R}\left( \frac{1}{b_{k}}-\frac{q_{kj}^tW^{1/2}C_kW^{1/2}q_{kj}}{b_{k}^2}\right) \\= & {} \eta _k\left( \frac{R-d_k}{b_k}-\sum _{j={d_k+1}}^R\frac{q_{kj}^tW^{1/2}C_kW^{1/2}q_{kj}}{b_k^2}\right) \end{aligned}$$

But,

$$\begin{aligned} \sum _{j=d_k+1}^R q_{j}^tW^{1/2}C_kW^{1/2}q_j=tr(W^{1/2}C_kW^{1/2})-\sum _{j=1}^{d_k}q_j^tW^{1/2}C_kW^{1/2}q_j, \end{aligned}$$

so:

$$\begin{aligned} -2\frac{\partial l(\theta )}{\partial b_{k}}= & {} \eta _k\frac{(R-d_k)}{b_k}-\frac{\eta _k}{b_k^2}\left( tr(W^{1/2}C_kW^{1/2})-\sum _{j=1}^{d_k}q_{kj}^tW^{1/2}C_kW^{1/2}q_{kj}\right) \\= & {} \eta _k\frac{(R-d_k)}{b_k}-\frac{\eta _k}{b_k^2}\left( tr(W^{1/2}C_kW^{1/2})-\sum _{j=1}^{d_k}\lambda _{kj}\right) \end{aligned}$$

The condition \(\frac{\partial l(\theta )}{\partial b_{k}}=0\) is equivalent to:

$$\begin{aligned}&\eta _k\frac{(R-d_k)}{b_k}-\frac{\eta _k}{b_k^2}\left( tr(W^{1/2}C_kW^{1/2})-\sum _{j=1}^{d_k}\lambda _{kj}\right) =0 \\&\quad \Leftrightarrow \eta _k\frac{(R-d_k)}{b_k}=\frac{\eta _k}{b_k^2}\left( tr(W^{1/2}C_kW^{1/2})-\sum _{j=1}^{d_k}\lambda _{kj}\right) \\&\quad \Leftrightarrow b_k=\frac{\eta _k}{\eta _k(R-d_k)}\left( tr(W^{1/2}C_kW^{1/2})-\sum _{j=1}^{d_k}\lambda _{kj}\right) \\&\quad \Leftrightarrow b_k=\frac{1}{(R-d_k)}\left( tr(W^{1/2}C_kW^{1/2})-\sum _{j=1}^{d_k}\lambda _{kj}\right) \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schmutz, A., Jacques, J., Bouveyron, C. et al. Clustering multivariate functional data in group-specific functional subspaces. Comput Stat 35, 1101–1131 (2020). https://doi.org/10.1007/s00180-020-00958-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-020-00958-4

Keywords

Navigation