A von Mises–Fisher mixture model for clustering numerical and categorical variables

Bry, Xavier; Cucala, Lionel

doi:10.1007/s11634-021-00449-4

A von Mises–Fisher mixture model for clustering numerical and categorical variables

Regular Article
Published: 10 July 2021

Volume 16, pages 429–455, (2022)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

341 Accesses
2 Citations
Explore all metrics

Abstract

This work presents a mixture model allowing to cluster variables of different types. All variables being measured on the same n statistical units, we first represent every variable with a unit-norm operator in ${\mathbb {R}}^{n\times n}$ endowed with an appropriate inner product. We propose a von Mises–Fisher mixture model on the unit-sphere containing these operators. The parameters of the mixture model are estimated with an EM algorithm, combined with a K-means procedure to obtain a good starting point. The method is tested on simulated data and eventually applied to wine data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recent Developments in Model-Based Clustering with Applications

Clustering Large Datasets by Merging K-Means Solutions

Article 29 March 2019

Mixture model modal clustering

Article 13 January 2018

References

Banerjee A, Dhillon I, Ghosh J, Sra S (2005) Clustering on the unit hypersphere using von Mises–Fisher distributions. J Mach Learn Res 6:1345–1382
MathSciNet MATH Google Scholar
Bry X, Cucala L (2018) Classifying variable-structures: a general framework. arXiv:1804.08901
Celeux G, Govaert G (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14:315–332
Article MathSciNet Google Scholar
Chavent M, Kuentz V, Liquet B, Saracco J (2012) ClustOfVar: an R package for the clustering of variables. J Stat Softw 50:1–16
Article Google Scholar
Chavent M, Kuentz V, Saracco J (2010) A partitioning method for the clustering of categorical variables. In: Proceedings of the 11th IFCS biennial conference and 33rd annual conference of the Gesellschaft für Klassifikation
Escoufier Y (1970) Échantillonnage dans une population de variables aléatoires réelles. Publications de l’Institut de Statistique de l’Université de Paris 19:1–47
Gomes A (1993) Reconnaissance de mélanges de lois de Bingham: application à la classification de variables. PhD Thesis, Université Montpellier 2
Gomes P (1987) Distribution de Bingham sur la n-sphere: une nouvelle approche de l’analyse factorielle. PhD Thesis, Université Montpellier 2
Grün B, Leisch F (2008) FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J Stat Softw 28:1–35
Article Google Scholar
Hornik K, Feinerer I, Kober M, Buchta C (2012) Spherical k-means clustering. J Stat Softw 50:1–22
Article Google Scholar
Hornik K, Grün B (2014) movMF: an R package for fitting mixtures of von Mises–Fisher distributions. J Stat Softw 58:1–31
Article Google Scholar
Hornik K, Grün B (2014) On maximum likelihood estimation of the concentration parameter of von Mises–Fisher distributions. Comput Stat 29:945–957
Article MathSciNet Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Article Google Scholar
Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, Hoboken
Book Google Scholar
Kiers H (1991) Simple structure in component analysis techniques for mixtures of qualitative and quantitative variables. Psychometrika 56:197–212
Article MathSciNet Google Scholar
Mardia K, Jupp P (2000) Directional statistics, second edn. Wiley, Hoboken
MATH Google Scholar
McLachlan G, Peel D (2000) Finite mixture models. Wiley, Hoboken
Book Google Scholar
Mood A, Graybill F, Boes D (2001) Introduction to the theory of statistics. Tata McGraw-Hill, New Delhi
MATH Google Scholar
Qannari EM, Vigneau E, Courcoux Ph (1998) Une nouvelle distance entre variables. Application en classification. Revue de Stat Appliquée 46:21–32
Google Scholar
Robert P, Escoufier Y (1976) A unifying tool for linear multivariate statistical methods: the RV-coefficient. Appl Stat 25:257–265
Article MathSciNet Google Scholar
Saracco J, Chavent M, Kuentz V (2010) Clustering of categorical variables around latent variables. Cahiers du GREThA UMR CNRS 5113, février 2010, Université Bordeaux 4
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Article MathSciNet Google Scholar
Soffritti G (1999) Hierarchical clustering of variables: a comparison among strategies of analysis. Commun Stat Simul Comput 28:977–999
Article Google Scholar
Tschuprow AA (1939) Principles of the mathematical theory of correlation. W. Hodge & Co
Vigneau E, Qannari EM (2003) Clustering of variables around latent components. Commun Stat Simul Comput 32:1131–1150
Article MathSciNet Google Scholar
Vigneau E, Qannari EM, Sahmer K, Ladiray D (2006) Classification de variables autour de composantes latentes. Revue de Stat Appliquée 54:27–45
Google Scholar

Download references

Author information

Authors and Affiliations

Université de Montpellier, Montpellier, France
Xavier Bry & Lionel Cucala

Authors

Xavier Bry
View author publications
You can also search for this author in PubMed Google Scholar
Lionel Cucala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lionel Cucala.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: proofs

a) $\Phi ^2(X,Y)=\textit{tr}(\Pi _X \Pi _Y)$.

Proof

Let X and Y be two categorical variables with q and r levels respectively, and let $\mathbf{X }$ and $\mathbf{Y }$ denote their respective matrices of q (resp. r) uncentred indicator variables, and X and Y their respective matrices of $q-1$ (resp. $r-1$) centred indicator variables. We have:

$$\begin{aligned} \langle X\rangle =\langle \mathbf{X } \rangle \cap \langle \mathbb {1} \rangle ^{\perp } ; \langle Y \rangle =\langle \mathbf{Y } \rangle \cap \langle \mathbb {1} \rangle ^{\perp } \end{aligned}$$

And so:

$$\begin{aligned} \Pi _X=\Pi _{\mathbf{X }}-\Pi _{\mathbb {1}}; \Pi _Y=\Pi _{\mathbf{Y }}-\Pi _{\mathbb {1}} \end{aligned}$$

(5)

Besides, it can easily be shown that:

$$\begin{aligned} \textit{tr}(\Pi _{\mathbf{X }} \Pi _{\mathbf{Y }})=\textit{tr}\left( \textit{MM}'\right) , \end{aligned}$$

(6)

where

$$\begin{aligned} M=(\mathbf{X }'W \mathbf{X })^{-1/2} \mathbf{X }'W \mathbf{Y} (\mathbf{Y }'W\mathbf{Y })^{-1/2} \end{aligned}$$

We have:

$$\begin{aligned} M=\left( m_{\textit{ij}}\right) _{i,j} \text { with } m_{\textit{ij}}=\frac{\pi _{\textit{ij}}}{\sqrt{\pi _{\textit{i.}}\pi _{.j}}} \end{aligned}$$

(7)

From (5), (6) and (7), we get:

$$\begin{aligned} \textit{tr}(\Pi _X \Pi _Y)=\displaystyle {\sum _{\begin{array}{l} i=1,\ldots ,q \\ j=1,\ldots ,r \end{array}}} \frac{(\pi _{\textit{ij}}-\pi _{\textit{i.}}\pi _{.j})^2}{\pi _{\textit{i.}}\pi _{.i}}=\Phi ^2(X,Y) \end{aligned}$$

b) $\forall x: \text {arg }\underset{y\in S}{\text {min }}\Vert x-y\Vert ^2=\frac{x}{\Vert x\Vert }$, where S is the unit sphere. $\square $

Proof

Let $\forall y,y^0=\frac{y}{\Vert y\Vert }$. Then

$$\begin{aligned} \forall x,\Vert x-y^0\Vert ^2=\Vert x\Vert ^2+\Vert y^0\Vert ^2-2\langle x|y^0\rangle \end{aligned}$$

So:

$$\begin{aligned} \text {arg }\underset{y\in S}{\text {min}}\Vert x-y\Vert ^2= & {} \text {arg }\underset{y\in S}{\text {max}}\langle x|y\rangle \\= & {} \text {arg }\underset{y\in S}{\text {max}}(\Vert x\Vert ^2\cos ^2(x,y)) \\= & {} \text {arg }\underset{y\in S}{\text {min}}(\Vert x\Vert ^2\sin ^2(x,y)) \\= & {} {\hat{y}}^0, \forall {\hat{y}}=\text {arg } \underset{y}{\text {min}} \big (\Vert x\Vert ^2\sin ^2(x,y)\big ) \end{aligned}$$

Taking for instance ${\hat{y}}=x $ gives the result.

c) Rank-r average of normed projectors.

Let $\Pi _U$ be the projector on a space spanned by H W-orthonormal vectors $u_1 , \ldots , u_r \in {\mathbb {R}}^n $ . Let U denote the matrix $[u_1 , \ldots , u_r] $ . The rank-r average of a set of p normed projectors ${\tilde{O}}_j$ is defined as the normed projector $ \tilde{{\bar{O}}}^r = \frac{\Pi _U}{\sqrt{r}}$ which verifies :

$$\begin{aligned} {\tilde{\bar{O}}}^r = \mathop {\hbox {arg min}}\limits _U \sum _{j} \llbracket \tilde{O}_U - {\tilde{O}}_j\rrbracket ^2. \end{aligned}$$

Now, $\forall j, {\tilde{O}}_j = X_j M_j X_j'W $ , where:

$X_j$ is the variable associated with the projector and coded as mentioned in Sect. 2.1, and
$M_j = \frac{(X_j'WX_j)^{-1}}{\sqrt{\dim (X_j)}}$

We denote $ X = [X_1 , \ldots , X_p]$ and $M = diag(M_j ; j=1,\ldots ,p) $ (block-diagonal matrix with blocks $M_j$).

Since $ \forall j, \llbracket \tilde{O}_U - {\tilde{O}}_j \rrbracket ^2 = 2(1 - [\tilde{O}_U | {\tilde{O}}_j ] ) $, we have:

$$\begin{aligned} \tilde{{\bar{O}}}^r= & {} \mathop {\hbox {arg max}}\limits _U \sum _{j} [\tilde{O}_U | {\tilde{O}}_j ] \\= & {} \mathop {\hbox {arg max}}\limits _U \sum _{j} [\Pi _U | {\tilde{O}}_j ] \\= & {} \mathop {\hbox {arg max}}\limits _U [\Pi _U | \sum _{j} {\tilde{O}}_j ] \\= & {} \mathop {\hbox {arg max}}\limits _U tr(UU'W \sum _{j} {\tilde{O}}_j ) \\= & {} \mathop {\hbox {arg max}}\limits _U tr(U'W \sum _{j} {\tilde{O}}_j U) \\= & {} \mathop {\hbox {arg max}}\limits _U ( \sum _{h=1,\ldots ,r} u_h'W \sum _{j} X_j M_j X_j'W u_h )\\= & {} \mathop {\hbox {arg max}}\limits _U ( \sum _{h=1,\ldots ,r} u_h'W XMX'W u_h )\\ \end{aligned}$$

Vectors $u_1 , \ldots , u_r $ being W-orthonormal, this maximization program is exactly that of the (dual) PCA of array X with metric M and weights W. The solution vectors $u_1 , \ldots , u_r$ are hence the r first PCs of (X, M, W), and $\tilde{{\bar{O}}}^H$ is the normed projector on the space they span. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bry, X., Cucala, L. A von Mises–Fisher mixture model for clustering numerical and categorical variables. Adv Data Anal Classif 16, 429–455 (2022). https://doi.org/10.1007/s11634-021-00449-4

Download citation

Received: 24 June 2020
Revised: 31 May 2021
Accepted: 07 June 2021
Published: 10 July 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11634-021-00449-4

Keywords

Mathematics Subject Classification

62H11

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A von Mises–Fisher mixture model for clustering numerical and categorical variables

Abstract

Access this article

Similar content being viewed by others

Recent Developments in Model-Based Clustering with Applications

Clustering Large Datasets by Merging K-Means Solutions

Mixture model modal clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: proofs

Proof

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A von Mises–Fisher mixture model for clustering numerical and categorical variables

Abstract

Access this article

Similar content being viewed by others

Recent Developments in Model-Based Clustering with Applications

Clustering Large Datasets by Merging K-Means Solutions

Mixture model modal clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: proofs

Appendix: proofs

Proof

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation