Abstract
We herein introduce variable selection procedures based on depth similarity, aimed at identifying a small subset of variables that can better explain the depth assigned to each point in space. Our study is not intended to deal with the case of high-dimensional data. Identifying noisy and dependent variables helps us understand the underlying distribution of a given dataset. The asymptotic behaviour of the proposed methods and numerical aspects concerning the computational burden are studied. Furthermore, simulations and a real data example are analysed.
Similar content being viewed by others
References
Caruso, G., Sosa-Escudero, W., Svarc, M.: Deprivation and the dimensionality of welfare: a variable-selection cluster-analysis approach. Rev. Income Wealth 61(4), 702–722 (2015). https://doi.org/10.1111/roiw.12127
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014). https://doi.org/10.1016/j.compeleceng.2013.11.024
Edo, M., Sosa-Escudero, W., Svarc, M.: A multidimensional approach to measuring the middle class. J. Econ. Inequal. (2020). https://doi.org/10.1007/s10888-020-09464-5
Ferro-Luzzi, G., Fluckiger, Y., Weber, S.: A cluster analysis of multidimensional poverty in Switzerland. In: Kakwani, N., Silber, J. (eds.) Quantitative Aproaches to Multidimensional Poverty Measurement. Palgrave Macmillan, New York (2008)
Fop, B., Murphy, T.B.: Variable selection methods for model-based clustering. Stat. Surv. 12, 1–48 (2018). https://doi.org/10.1214/18-SS119
Fraiman, R., Gimenez, Y., Svarc, M.: Seeking relevant information from a statistical model. ESAIM Probab. Stat. 20, 463–479 (2016). https://doi.org/10.1051/ps/2016022
Fraiman, D., Fraiman, N., Fraiman, R.: Non parametric statistics of dynamic networks with distinguishable nodes. Test 26(3), 546–573 (2017). https://doi.org/10.1007/s11749-017-0524-8
Fraiman, R., Muniz, G.: Trimmed means for functional data. Test 10, 419–440 (2001). https://doi.org/10.1007/BF02595706
Gasparini, L., Sosa-Escudero, W., Marchionni, M., Olivieri, S.: Multidimensional poverty in Latin America and the Caribbean: new evidence from the Gallup World Poll. J. Econ. Inequal. 11(2), 195–214 (2013). https://doi.org/10.1007/s10888-011-9206-z
Genest, G., Masse, J.C., Plante, J.F.: Depth: depth functions tools for multivariate analysis. R package version 2.0-0 (2012). http://CRAN.R-project.org/package=depth
Gijbels, I., Nagy, S.: On a general definition of depth for functional data. Stat. Sci. 32(4), 630–639 (2017). https://doi.org/10.1214/17-STS625
Kakwani, N., Silber, J.: Quantitative Approaches to Multidimensional Poverty Measurement. Palgrave Macmillan, New York (2008)
Kosiorowski, D, Zawadzki, Z.: DepthProc an R Package for Robust Exploration of Multidimensional Economic Phenomena (2019). https://arxiv.org/pdf/1408.4542.pdf
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection. ACM Comput. Surv. 50(6), 1–45 (2017). https://doi.org/10.1145/3136625
Liu, R.: On a notion of data depth based on random simplices. Ann. Stat. 18, 405–414 (1990). https://doi.org/10.1214/aos/1176347507
Lopez-Pintado, S., Torrente, A.: depthTools: Depth Tools Package. R package version 0.4 (2013). http://CRAN.R-project.org/package=depthTools
Mosler, K.: Depth statistics. In: Becker, C., Fried, R., Kuhnt, S. (eds.) Robustness and Complex Data Structures, Festschrift in Honour of Ursula Gather 17–34. Springer, Berlin (2013)
Nagy, S., Gijbels, I., Omelka, M., Hlubinka, D.: Integrated depth for functional data: statistical properties and consistency. ESAIM Probab. Stat. 20, 95–130 (2016). https://doi.org/10.1051/ps/2016005
Pokotylo, O., Mozharovskyi, P., Dyckerhoff, R., Nagy, S.: ddalpha: Depth-Based Classification and calculation of Data Depth. R package version 1.3.11. (2020). https://CRAN.Rproject.org/package=ddalpha
Sen, A.: Commodities and Capabilities. Oxford University Press, Oxford (1985)
Singh, K.: A notion of majority depth. Technical Report, Rutgers University, Department of Statistics (1991)
Strucca, L.: GA: a package for genetic algorithms in R. J. Stat. Softw. 53(4), 1–37 (2013). https://doi.org/10.18637/jss.v053.i04
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996). https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Tukey, J.W.: Mathematics and the picturing of data. Proc. Int. Congr. Math. Vanc. 2, 523–531 (1975)
Witten, D.M., Tibshirani, R.: Testing significance of features by lassoed principal components. Ann. Appl. Stat. 2(14), 986–1012 (2008). https://doi.org/10.1214/08-AOAS182
Yang, L.: Measuring well-being: a multidimensional index integrating subjective well-being and preferences. J. Hum. Dev. Capab. 19(4), 456–476 (2018). https://doi.org/10.1080/19452829.2018.1474859
Zuo, Y.: A note on finite sample breakdown points of projection based multivariate location and scatter statistics. Metrika 51, 259–265 (2000). https://doi.org/10.1007/s001840000053
Zuo, Y., Serfling, R.: General notion of statistical depth function. Ann. Stat. 28(2), 461–482 (2000a). https://doi.org/10.1214/aos/1016218226
Zuo, Y., Serfling, R.: Structural properties and convergence results for contours of sample statistical depth functions. Ann. Stat. 28, 483–499 (2000b). https://doi.org/10.1214/aos/1016218227
Zuo, Y., Serfling, R.: On the performance of some robust nonparametric location measures relative to a general notion of multivariate symmetry. J. Stat. Plan. Inference 84, 55–79 (2000c). https://doi.org/10.1016/S0378-3758(99)00142-1
Acknowledgements
The authors would like to thank Centro de Cómputos de Alto Rendimiento (CeCAR) for granting use of computational resources which allowed us to perform most of the experiments included in this work. And also the anonymous reviewers for their careful reading and their insightful comments and suggestions which improved the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was partially supported by Grant pict 2018-00740 from anpcyt at Buenos Aires, Argentina, and by Grant pict 2018-00740 from anpcyt at Buenos Aires, Argentina, and by the Spanish Agencia Estatal de Investigación (AEI) and Fondo Europeo de Desarrollo Regional (FEDER), Grant CTM2016-79741-R for MICROAIPOLAR project.
Appendix
Appendix
Proof of Theorem 1.
Proof
Convergence \(I_n(k)\buildrel {\mathrm{a.s.}}\over \longrightarrow I_\infty (k)\) stated in the Theorem must be understood as almost surely (\(\omega\)) there exists \(n_0=n_0(\omega )\) such that \(I_n(k)=I_\infty (k)\) for \(n\ge n_0.\) Denote the objective function of the population definition (1) by h(I) and the objective function for the estimation (3) by \(h_n(I)\), i.e.
In order to prove the consistency stated in the theorem, due to the finiteness of the quantity of subjects I and the uniqueness of the minimizer in (1), it is enough to prove that for all I
Denote by \(A_n(I)=(\sum _{j=1}^n[q(D({\mathbf {X}}_j[I],P[I]))-q(D({\mathbf {X}}_j,P))]^2)/n\). By the law of large numbers, we have that \(A_n(I)\buildrel {\mathrm{a.s.}}\over \longrightarrow h(I)\), then by setting \(h_n(I)=h_n(I)-A_n(I)+A_n(I)\), to prove (11) it suffices to prove that \(|h_n(I)-A_n(I)|\buildrel {\mathrm{a.s.}}\over \longrightarrow 0\). After some calculations we get
where \(S_i\) is the ith average of the last equation. Let us first see that \(S_2\buildrel {\mathrm{a.s.}}\over \longrightarrow 0\). By denoting \(Y_{n,j}(\omega )=D({\mathbf {X}}_j(\omega ),P_n(\omega ))\) and \(Y_j(\omega )=D({\mathbf {X}}_j(\omega ),P)\) we have that \(|Y_{n,j}(\omega )-Y_j(\omega )|\le s_n(\omega )\) independent of j and therefore \(Y_{n,j}\buildrel {\mathrm{a.s.}}\over \longrightarrow Y_j\). Let in general denote \(F_{Y}\) the cumulative distribution of the random variable Y. By basic properties of convergence we have that \(F_{Y_{n,1}}\buildrel {D}\over \longrightarrow F_{Y_1}\). Since \(F_{Y_1}\) is continuous, we have that \(F_{Y_{n,1}}\buildrel {u}\over \longrightarrow F_{Y_1}\). Then, since \(F_{Y_{n,j}}=F_{Y_{n,1}}\) and \(F_{Y_j}=F_{Y_1}\), we have that
where the first \(\epsilon\) in the last equation is due to the uniform convergence \(F_{Y_{n,1}}\buildrel {u}\over \longrightarrow F_{Y_1}\) and holds for \(n\ge n_0\) and the second \(\epsilon\) is due to the uniform continuity of \(F_{Y_1}\) together with the almost sure convergence \(Y_{n,j}\buildrel {\mathrm{a.s.}}\over \longrightarrow Y_j\). In fact, fixed \(\omega\) for which (5) holds, there exists \(n_1(\omega )\) such that \(n\ge n_1(\omega )\) implies that \(|F_{Y_1}(Y_{n,j}(\omega ))-F_{Y_1}(Y_j(\omega ))|<\epsilon\). Then, given \(\epsilon >0\) and \(\omega\) for which (5) holds, taking \(n_2(\omega )=\max \{n_0,n_1(\omega )\}\), if \(n\ge n_2(\omega )\) then \(|q(D({\mathbf {X}}_j,P_n))-q(D({\mathbf {X}}_j,P))|<2\epsilon\). Now, by using that the quantiles take values in [0, 1], that the function \(f(x)=x^2\) satisfies \(0\le f^{\prime }(c)\le 2\) for \(0\le c\le 1\) and the Lagrange theorem we have that \(|q^2(D({\mathbf {X}}_j,P_n))-q^2(D({\mathbf {X}}_j,P))|< 4 \epsilon \,\, \forall j\) and \(n\ge n_2(\omega )\) and so the same will hold for the average \(S_2\), and so we have proved that \(S_2\buildrel {\mathrm{a.s.}}\over \longrightarrow 0\).
The proof that \(S_1\buildrel {\mathrm{a.s.}}\over \longrightarrow 0\) runs in the same way as the proof of \(S_2\buildrel {\mathrm{a.s.}}\over \longrightarrow 0\). Using that \(S_1\buildrel {\mathrm{a.s.}}\over \longrightarrow 0\) and \(S_2\buildrel {\mathrm{a.s.}}\over \longrightarrow 0\) is very simple to see that also \(S_3\buildrel {\mathrm{a.s.}}\over \longrightarrow 0\), and we have proved that \(h_n(I)\buildrel {\mathrm{a.s.}}\over \longrightarrow h(I)\).\(\square\)
Proof of Theorem 2.
Proof
Since the cardinality of the subsets I with \(|I|=k\) is finite for every k, it will suffice to prove that the empirical correlation \(\rho _n(\mathbf {D}_n(I),\mathbf {D}_n)\) of (4) converges almost surely to \(\rho (D(I),D({\mathbf {X}},P))\) of (2). Recall that \(\mathbf {D}_n=(D({\mathbf {X}}_1,P_n),\ldots ,D({\mathbf {X}}_n,P_n))\) and \(\mathbf {D}_n(I)=(D({\mathbf {X}}_1(I),P_n(I)),\ldots ,D({\mathbf {X}}_n(I),P_n(I)))\). Denote by \(\mathbf {D}^*_n=(D({\mathbf {X}}_1,P),\ldots ,D({\mathbf {X}}_n,P))\) and \(\mathbf {D}^*_n(I)=(D({\mathbf {X}}_1(I),P(I)),\ldots ,D({\mathbf {X}}_n(I),P(I)))\). We have by the triangular inequality that in order to prove \(|\rho _n(\mathbf {D}_n(I),\mathbf {D}_n)-\rho (D(I),D({\mathbf {X}},P))|\buildrel {\mathrm{a.s.}}\over \longrightarrow 0\), it suffices to prove the following:
We will first prove (12). By definition of the empirical correlation, we have that
where \(\text{ Cov}_n\) and \(\text{ Var}_n\) are the empirical covariance and variance, respectively. On the other hand,
Using the strong law of large numbers, we get almost sure convergence both for the numerator and the denominator of (14) to the numerator and the denominator of (15), respectively. Since both \(\text{ Var }(D(I))\) and \(\text{ Var }(D({\mathbf {X}},P))\) are different from 0 we get that (12) holds, as desired.
We now prove (13). By definition we have:
Based on the expressions, we have for \(\rho _n(\mathbf {D}^*_n(I),\mathbf {D}^*_n)\) and \(\rho _n(\mathbf {D}_n(I),\mathbf {D}_n)\) in (14) and (16), respectively, and since the denominator of (14) has a limit different from zero, to prove (13) it will suffice to prove that both the numerator and denomiator of (14) and (16) approach each other a.s., i.e.,
We will concentrate in proving (17). The proof of (18) is analogous. First note that since (5) holds, given any \(\varepsilon >0\), there exists \(n_0=n_0(\varepsilon )\) such that \(\Vert \mathbf {D}_n(I)-\mathbf {D}^*_n(I)\Vert _\infty <\varepsilon\) and \(\Vert \mathbf {D}_n-\mathbf {D}^*_n\Vert _\infty <\varepsilon\) for \(n\ge n_0\) with probability one. Also note that the coordinates of \(\mathbf {D}_n^*(I)\) and \(\mathbf {D}_n^*\) lie in [0, 1] because of being depths. Elementary calculations allow to see that \(|\text{ Cov}_n(\mathbf {D}^*_n(I),\mathbf {D}^*_n)-\text{ Cov}_n(\mathbf {D}_n(I),\mathbf {D}_n)|\le 4\varepsilon +\varepsilon ^2\) for \(n\ge n_0\), and from this convergence (17) can be concluded.\(\square\)
Proof of Corollary 1.
Proof
Since the cardinality of the subsets I is finite, it will suffice to prove that \(\frac{1}{n}\sum _{j=1}^n[q(D_{n,j}(I))-q(D({\mathbf {X}}_j,P_n))]^2+\lambda |I|\) converges almost surely to \({\mathbb {E}}[q(D(I))-q(D({\mathbf {X}},P))]^2+\lambda |I|\). This result holds as a consequence of Theorem 1.\(\square\)
Proof of Lemma 1.
Proof
Let \(\lambda _1<\lambda _2\). To prove that \(K^*(\lambda _2)\le K^*(\lambda _1)\), it suffices to show that \(c_k+\lambda _2 k > c_{K^*(\lambda _1)}+\lambda _2 K^*(\lambda _1)\) for every \(k>K^*(\lambda _1)\).
Let \(k>K^*(\lambda _1)\), we have that \(c_{K^*(\lambda _1)}+\lambda _1 K^*(\lambda _1)\le c_k +\lambda _1 k\). Finally, adding \((\lambda _2-\lambda _1)K^*(\lambda _1)\) to both sides of the inequality the proof is complete.
\(\square\)
Proof of Lemma 2.
Proof
The proof has two steps. First, we show that if \(\lambda \ge \varepsilon\) then \(K^*(\lambda )\le k_0\) and then that if \(\lambda < d/(k_0-1)\), then \(K^*(\lambda )\ge k_0\).
-
Step 1 We will show that \(K^*(\varepsilon )\le k_0\). Consider \(k>k_0\), from H1, we have that \(c_k=c_{k_0}+(c_{k}-c_{k_0})=c_{k_0}-{\tilde{\varepsilon }}(k-k_0)\), with \(0<{\tilde{\varepsilon }}\le \varepsilon\). Hence,
$$\begin{aligned} c_k+\varepsilon k&= c_{k_0}+\varepsilon k_0 + (\varepsilon -{\tilde{\varepsilon }})(k-k_0)\\ & \ge c_{k_0}+\varepsilon k_0, \end{aligned}$$thus we conclude that \(K^*(\varepsilon )\le k_0.\) Moreover, from Lemma 1 we have that \(K^*(\lambda )\le k_0\) if \(\lambda \ge \varepsilon\).
-
Step 2 We prove that if \(\lambda < d/(k_0-1)\), then \(K^*(\lambda )\ge k_0\).
We’ll show that if \(k<k_0\) then \(c_k+\lambda k > c_{k_0}+\lambda k_0\).
Let \(k<k_0\), then
$$\begin{aligned} c_k+\lambda k&\ge (c_{k_0}+d)+ \lambda \\ &> c_{k_0}+\lambda (k_0-1)+\lambda \\ &= c_{k_0}+\lambda k_0, \end{aligned}$$the last inequality holds since \(\lambda < d/(k_0-1).\) \(\square\)
Rights and permissions
About this article
Cite this article
Alvarez, A., Svarc, M. A variable selection procedure for depth measures. AStA Adv Stat Anal 105, 247–271 (2021). https://doi.org/10.1007/s10182-021-00391-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-021-00391-y