Skip to main content
Log in

A von Mises–Fisher mixture model for clustering numerical and categorical variables

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

This work presents a mixture model allowing to cluster variables of different types. All variables being measured on the same n statistical units, we first represent every variable with a unit-norm operator in \({\mathbb {R}}^{n\times n}\) endowed with an appropriate inner product. We propose a von Mises–Fisher mixture model on the unit-sphere containing these operators. The parameters of the mixture model are estimated with an EM algorithm, combined with a K-means procedure to obtain a good starting point. The method is tested on simulated data and eventually applied to wine data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Banerjee A, Dhillon I, Ghosh J, Sra S (2005) Clustering on the unit hypersphere using von Mises–Fisher distributions. J Mach Learn Res 6:1345–1382

    MathSciNet  MATH  Google Scholar 

  • Bry X, Cucala L (2018) Classifying variable-structures: a general framework. arXiv:1804.08901

  • Celeux G, Govaert G (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14:315–332

    Article  MathSciNet  Google Scholar 

  • Chavent M, Kuentz V, Liquet B, Saracco J (2012) ClustOfVar: an R package for the clustering of variables. J Stat Softw 50:1–16

    Article  Google Scholar 

  • Chavent M, Kuentz V, Saracco J (2010) A partitioning method for the clustering of categorical variables. In: Proceedings of the 11th IFCS biennial conference and 33rd annual conference of the Gesellschaft für Klassifikation

  • Escoufier Y (1970) Échantillonnage dans une population de variables aléatoires réelles. Publications de l’Institut de Statistique de l’Université de Paris 19:1–47

  • Gomes A (1993) Reconnaissance de mélanges de lois de Bingham: application à la classification de variables. PhD Thesis, Université Montpellier 2

  • Gomes P (1987) Distribution de Bingham sur la n-sphere: une nouvelle approche de l’analyse factorielle. PhD Thesis, Université Montpellier 2

  • Grün B, Leisch F (2008) FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J Stat Softw 28:1–35

    Article  Google Scholar 

  • Hornik K, Feinerer I, Kober M, Buchta C (2012) Spherical k-means clustering. J Stat Softw 50:1–22

    Article  Google Scholar 

  • Hornik K, Grün B (2014) movMF: an R package for fitting mixtures of von Mises–Fisher distributions. J Stat Softw 58:1–31

    Article  Google Scholar 

  • Hornik K, Grün B (2014) On maximum likelihood estimation of the concentration parameter of von Mises–Fisher distributions. Comput Stat 29:945–957

    Article  MathSciNet  Google Scholar 

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218

    Article  Google Scholar 

  • Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, Hoboken

    Book  Google Scholar 

  • Kiers H (1991) Simple structure in component analysis techniques for mixtures of qualitative and quantitative variables. Psychometrika 56:197–212

    Article  MathSciNet  Google Scholar 

  • Mardia K, Jupp P (2000) Directional statistics, second edn. Wiley, Hoboken

    MATH  Google Scholar 

  • McLachlan G, Peel D (2000) Finite mixture models. Wiley, Hoboken

    Book  Google Scholar 

  • Mood A, Graybill F, Boes D (2001) Introduction to the theory of statistics. Tata McGraw-Hill, New Delhi

    MATH  Google Scholar 

  • Qannari EM, Vigneau E, Courcoux Ph (1998) Une nouvelle distance entre variables. Application en classification. Revue de Stat Appliquée 46:21–32

    Google Scholar 

  • Robert P, Escoufier Y (1976) A unifying tool for linear multivariate statistical methods: the RV-coefficient. Appl Stat 25:257–265

    Article  MathSciNet  Google Scholar 

  • Saracco J, Chavent M, Kuentz V (2010) Clustering of categorical variables around latent variables. Cahiers du GREThA UMR CNRS 5113, février 2010, Université Bordeaux 4

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MathSciNet  Google Scholar 

  • Soffritti G (1999) Hierarchical clustering of variables: a comparison among strategies of analysis. Commun Stat Simul Comput 28:977–999

    Article  Google Scholar 

  • Tschuprow AA (1939) Principles of the mathematical theory of correlation. W. Hodge & Co

  • Vigneau E, Qannari EM (2003) Clustering of variables around latent components. Commun Stat Simul Comput 32:1131–1150

    Article  MathSciNet  Google Scholar 

  • Vigneau E, Qannari EM, Sahmer K, Ladiray D (2006) Classification de variables autour de composantes latentes. Revue de Stat Appliquée 54:27–45

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lionel Cucala.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: proofs

Appendix: proofs

a) \(\Phi ^2(X,Y)=\textit{tr}(\Pi _X \Pi _Y)\).

Proof

Let X and Y be two categorical variables with q and r levels respectively, and let \(\mathbf{X }\) and \(\mathbf{Y }\) denote their respective matrices of q (resp. r) uncentred indicator variables, and X and Y their respective matrices of \(q-1\) (resp. \(r-1\)) centred indicator variables. We have:

$$\begin{aligned} \langle X\rangle =\langle \mathbf{X } \rangle \cap \langle \mathbb {1} \rangle ^{\perp } ; \langle Y \rangle =\langle \mathbf{Y } \rangle \cap \langle \mathbb {1} \rangle ^{\perp } \end{aligned}$$

And so:

$$\begin{aligned} \Pi _X=\Pi _{\mathbf{X }}-\Pi _{\mathbb {1}}; \Pi _Y=\Pi _{\mathbf{Y }}-\Pi _{\mathbb {1}} \end{aligned}$$
(5)

Besides, it can easily be shown that:

$$\begin{aligned} \textit{tr}(\Pi _{\mathbf{X }} \Pi _{\mathbf{Y }})=\textit{tr}\left( \textit{MM}'\right) , \end{aligned}$$
(6)

where

$$\begin{aligned} M=(\mathbf{X }'W \mathbf{X })^{-1/2} \mathbf{X }'W \mathbf{Y} (\mathbf{Y }'W\mathbf{Y })^{-1/2} \end{aligned}$$

We have:

$$\begin{aligned} M=\left( m_{\textit{ij}}\right) _{i,j} \text { with } m_{\textit{ij}}=\frac{\pi _{\textit{ij}}}{\sqrt{\pi _{\textit{i.}}\pi _{.j}}} \end{aligned}$$
(7)

From (5), (6) and (7), we get:

$$\begin{aligned} \textit{tr}(\Pi _X \Pi _Y)=\displaystyle {\sum _{\begin{array}{l} i=1,\ldots ,q \\ j=1,\ldots ,r \end{array}}} \frac{(\pi _{\textit{ij}}-\pi _{\textit{i.}}\pi _{.j})^2}{\pi _{\textit{i.}}\pi _{.i}}=\Phi ^2(X,Y) \end{aligned}$$

b) \(\forall x: \text {arg }\underset{y\in S}{\text {min }}\Vert x-y\Vert ^2=\frac{x}{\Vert x\Vert }\), where S is the unit sphere. \(\square \)

Proof

Let \(\forall y,y^0=\frac{y}{\Vert y\Vert }\). Then

$$\begin{aligned} \forall x,\Vert x-y^0\Vert ^2=\Vert x\Vert ^2+\Vert y^0\Vert ^2-2\langle x|y^0\rangle \end{aligned}$$

So:

$$\begin{aligned} \text {arg }\underset{y\in S}{\text {min}}\Vert x-y\Vert ^2= & {} \text {arg }\underset{y\in S}{\text {max}}\langle x|y\rangle \\= & {} \text {arg }\underset{y\in S}{\text {max}}(\Vert x\Vert ^2\cos ^2(x,y)) \\= & {} \text {arg }\underset{y\in S}{\text {min}}(\Vert x\Vert ^2\sin ^2(x,y)) \\= & {} {\hat{y}}^0, \forall {\hat{y}}=\text {arg } \underset{y}{\text {min}} \big (\Vert x\Vert ^2\sin ^2(x,y)\big ) \end{aligned}$$

Taking for instance \({\hat{y}}=x \) gives the result.

c) Rank-r average of normed projectors.

Let \(\Pi _U\) be the projector on a space spanned by H W-orthonormal vectors \(u_1 , \ldots , u_r \in {\mathbb {R}}^n \) . Let U denote the matrix \([u_1 , \ldots , u_r] \) . The rank-r average of a set of p normed projectors \({\tilde{O}}_j\) is defined as the normed projector \( \tilde{{\bar{O}}}^r = \frac{\Pi _U}{\sqrt{r}}\) which verifies :

$$\begin{aligned} {\tilde{\bar{O}}}^r = \mathop {\hbox {arg min}}\limits _U \sum _{j} \llbracket \tilde{O}_U - {\tilde{O}}_j\rrbracket ^2. \end{aligned}$$

Now, \(\forall j, {\tilde{O}}_j = X_j M_j X_j'W \) , where:

  • \(X_j\) is the variable associated with the projector and coded as mentioned in Sect. 2.1, and

  • \(M_j = \frac{(X_j'WX_j)^{-1}}{\sqrt{\dim (X_j)}}\)

We denote \( X = [X_1 , \ldots , X_p]\) and \(M = diag(M_j ; j=1,\ldots ,p) \) (block-diagonal matrix with blocks \(M_j\)).

Since \( \forall j, \llbracket \tilde{O}_U - {\tilde{O}}_j \rrbracket ^2 = 2(1 - [\tilde{O}_U | {\tilde{O}}_j ] ) \), we have:

$$\begin{aligned} \tilde{{\bar{O}}}^r= & {} \mathop {\hbox {arg max}}\limits _U \sum _{j} [\tilde{O}_U | {\tilde{O}}_j ] \\= & {} \mathop {\hbox {arg max}}\limits _U \sum _{j} [\Pi _U | {\tilde{O}}_j ] \\= & {} \mathop {\hbox {arg max}}\limits _U [\Pi _U | \sum _{j} {\tilde{O}}_j ] \\= & {} \mathop {\hbox {arg max}}\limits _U tr(UU'W \sum _{j} {\tilde{O}}_j ) \\= & {} \mathop {\hbox {arg max}}\limits _U tr(U'W \sum _{j} {\tilde{O}}_j U) \\= & {} \mathop {\hbox {arg max}}\limits _U ( \sum _{h=1,\ldots ,r} u_h'W \sum _{j} X_j M_j X_j'W u_h )\\= & {} \mathop {\hbox {arg max}}\limits _U ( \sum _{h=1,\ldots ,r} u_h'W XMX'W u_h )\\ \end{aligned}$$

Vectors \(u_1 , \ldots , u_r \) being W-orthonormal, this maximization program is exactly that of the (dual) PCA of array X with metric M and weights W. The solution vectors \(u_1 , \ldots , u_r\) are hence the r first PCs of (XMW), and \(\tilde{{\bar{O}}}^H\) is the normed projector on the space they span. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bry, X., Cucala, L. A von Mises–Fisher mixture model for clustering numerical and categorical variables. Adv Data Anal Classif 16, 429–455 (2022). https://doi.org/10.1007/s11634-021-00449-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-021-00449-4

Keywords

Mathematics Subject Classification

Navigation