Skip to main content
Log in

Robust composite weighted quantile screening for ultrahigh dimensional discriminant analysis

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

This paper is concerned with feature screening for the ultrahigh dimensional discriminant analysis. A new feature screening procedure based on the conditional quantile is proposed. The proposed procedure has some desirable features. First, it is model-free which does not require specific discriminant model and can be directly applied to the multi-categories situation. Second, it is robust against heavy-tailed distributions, potential outliers and the sample shortage for some categories, which are very common for high dimensional data. We establish the sure screening property and ranking consistency property of the proposed procedure under some regular conditions. Simulation studies and a real data example are used to assess its finite sample performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Armstrong SA, Staunton JE, Silverman LB, Pieters R, Den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30:41–47

    Article  Google Scholar 

  • Chen X, Chen X, Liu Y (2017) A note on quantile feature screening via distance correlation. Stat Papers 60:1741–1762

    Article  MathSciNet  Google Scholar 

  • Cheng G, Li X, Lai P, Song F, Yu J (2017) Robust rank screening for ultrahigh dimensional discriminant analysis. Stat Comput 27:535–545

    Article  MathSciNet  Google Scholar 

  • Cui H, Li R, Zhong W (2015) Model-free feature screening for ultrahigh dimensional discriminant analysis. J Am Stat Assoc 110:630–641

    Article  MathSciNet  Google Scholar 

  • Fan J, Fan Y (2008) High dimensional classification using features annealed independence rules. Ann Stat 36:2605–2637

    Article  MathSciNet  Google Scholar 

  • Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B 70:849–911

    Article  MathSciNet  Google Scholar 

  • Fan J, Song R (2010) Sure independence screening in generalized linear models with NP-dimensionality. Ann Stat 38:3567–3604

    Article  MathSciNet  Google Scholar 

  • Fan J, Feng Y, Song R (2011) Nonparametric independence screening in sparse ultra-high dimensional additive models. J Am Stat Assoc 106:544–557

    Article  MathSciNet  Google Scholar 

  • Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58:13–30

    Article  MathSciNet  Google Scholar 

  • Lai P, Song F, Chen K, Liu Z (2017) Model free feature screening with dependent variable in ultrahigh dimensional binary classification. Stat Probab Lett 125:141–148

    Article  MathSciNet  Google Scholar 

  • Li R, Zhong W, Zhu L (2012) Feature screening via distance correlation learning. J Am Stat Assoc 107:1129–1139

    Article  MathSciNet  Google Scholar 

  • Liu J, Li R, Wu R (2014) Feature selection for varying coefficient models with ultrahigh-dimensional covariates. J Am Stat Assoc 109:266–274

    Article  MathSciNet  Google Scholar 

  • Lo SH, Singh K (1986) The product-limit estimator and the bootstrap: some asymptotic representations. Probab Theory Relat Fields 71:455–465

    Article  MathSciNet  Google Scholar 

  • Mai Q, Zou H (2013) The Kolmogorov filter for variable screening in high-dimensional binary classification. Biometrika 100:229–234

    Article  MathSciNet  Google Scholar 

  • Pan R, Wang H, Li R (2016) Ultrahigh dimensional multi-class linear discriminant analysis by pairwise sure independence screening. J Am Stat Assoc 111:169–179

    Article  Google Scholar 

  • Song F, Lai P, Shen B, Cheng G (2018) Variance ratio screening for ultrahigh dimensional discriminant analysis. Commun Stat Theory Methods 47:6034–6051

    Article  MathSciNet  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodol) 58:267–288

    Article  Google Scholar 

  • Tibshirani R, Hastie T, Narasimhan B, Chu G (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci 99:6567–6572

    Article  Google Scholar 

  • Wu Y, Yin G (2015) Conditional quantile screening in ultrahigh-dimensional heterogeneous data. Biometrika 102:65–76

    Article  MathSciNet  Google Scholar 

  • Zhu L, Li L, Li R, Zhu L (2011) Model-free feature screening for ultrahigh-dimensional Data. J Am Stat Assoc 106:1464–1475

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Peng Lai’s research was supported by National Natural Science Foundation of China (Grant No. 11771215), Natural Science Foundation of Jiangsu Province (Grant No. BK20161530).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Lai.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

To prove the two theorems, we present the following lemma.

Lemma 1

[Hoeffding’s Inequality; Hoeffding (1963)] Let \(X_1,\ldots ,X_n\) be independent random variables. Assume that \(P(X_i\in [a_i,b_i])=1\) for \(1\le {i}\le {n}\), where \(a_i\) and \(b_i\) are constants. Let \(\overline{X}=\frac{1}{n}\sum _{i=1}^n{X_i}\). Then the following inequality holds:

$$\begin{aligned} P\left( \big |\overline{X}-E(\overline{X})\big |\ge {t}\right) \le {2\exp \left\{ -\frac{2n^{2}t^{2}}{\sum _{i=1}^n{(b_i-a_i)^{2}}}\right\} }, \end{aligned}$$

where t is a positive constant and \(E(\overline{X})\) is the expected value of \(\overline{X}\).

Proof of Theorem 1

According the definitions of \(\omega _{j}\) and \(\hat{\omega }_{j}\), we have

$$\begin{aligned}&P \left\{ |\hat{\omega }_{j}-\omega _{j}|\ge \varepsilon \right\} \\&\quad =P\left\{ \Big |\frac{1}{M}\sum _{k=1}^{M}\sum _{r=1}^{R_{n}}\hat{p}_{r}\left( \widehat{Q}_{\tau _{k}}(X_{j}|Y=y_{r})-\widehat{Q}_{\tau _{k}}(X_{j})\right) ^{2}\right. \\&\qquad -\,\frac{1}{M}\sum _{k=1}^{M}\sum _{r=1}^{R_{n}}p_{r}\left( Q_{\tau _{k}}(X_{j}|Y=y_{r})-Q_{\tau _{k}}(X_{j})\right) ^{2}\\&\qquad \left. +\,\frac{1}{M}\sum _{k=1}^{M}\sum _{r=1}^{R_{n}}p_{r}\left( Q_{\tau _{k}}(X_{j}|Y=y_{r})-Q_{\tau _{k}}(X_{j})\right) ^{2}-\omega _{j}\Big |\ge \varepsilon \right\} \\&\qquad \triangleq P\left\{ \big |\hat{\omega }_{j}-\tilde{\omega }_{j}+\tilde{\omega }_{j}-\omega _{j}\big |\ge \varepsilon \right\} , \end{aligned}$$

where \(\tilde{\omega }_{j}=\frac{1}{M}\sum _{k=1}^{M}\sum _{r=1}^{R_{n}}p_{r}\Big (Q_{\tau _{k}}(X_{j}|Y=y_{r})-Q_{\tau _{k}}(X_{j})\Big )^{2}\). According to the property of integral, \(|\tilde{\omega }_{j}-\omega _{j}|=O(M^{-2})\), when \(M> \sqrt{2/\epsilon }\), we can get \(|\tilde{\omega }_{j}-\omega _{j}|\le \frac{ \varepsilon }{2}\). Consequently,

$$\begin{aligned}&P\left\{ |\hat{\omega }_{j}-\tilde{\omega }_{j}|\ge \varepsilon /2 \right\} \\&\quad \le \sum _{k=1}^{M} P\left\{ \Big |\sum _{r=1}^{R_{n}} \left[ \hat{p}_{r}\left( \widehat{Q}_{\tau _{k}}(X_{j}|Y=y_{r})-\widehat{Q}_{\tau _{k}}(X_{j})\right) ^{2}\right. \right. \\&\qquad \left. \left. -\,p_{r}\left( Q_{\tau _{k}}(X_{j}|Y=y_{r})-Q_{\tau _{k}}(X_{j})\right) ^{2}\right] \Big |\ge \varepsilon /2\right\} \\&\quad \le \sum _{k=1}^{M} P\left\{ \Big |\sum _{r=1}^{R_{n}} \left[ \hat{p}_{r}\left( \widehat{Q}_{\tau _{k}}(X_{j}|Y=y_{r})-\widehat{Q}_{\tau _{k}}(X_{j})\right) ^{2}\right. \right. \\&\qquad -\,\hat{p}_{r}\left( Q_{\tau _{k}}(X_{j}|Y=y_{r})-Q_{\tau _{k}}(X_{j})\right) ^{2}\\&\qquad \left. \left. +\,(\hat{p}_{r}-p_{r})\left( Q_{\tau _{k}}(X_{j}|Y=y_{r})-Q_{\tau _{k}}(X_{j})\right) ^{2}\right] \Big |\ge \varepsilon /2\right\} . \end{aligned}$$

In fact,

$$\begin{aligned}&\left( \widehat{Q}_{\tau _{k}}(X_{j}|Y=y_{r})-\widehat{Q}_{\tau _{k}}(X_{j})\right) ^{2}-\left( Q_{\tau _{k}}(X_{j}|Y=y_{r})-Q_{\tau _{k}}(X_{j})\right) ^{2}\\&\quad = \left\{ 2\left[ [Q_{\tau _{k}}(X_{j}|Y=y_{r})-Q_{\tau _{k}}(X_{j})]\right. \right. \\&\qquad \left. [(\hat{Q}_{\tau _{k}}(X_{j}|Y=y_{r})-Q_{\tau _{k}}(X_{j}|Y=y_{r})) +(Q_{\tau _{k}}(X_{j})-\hat{Q}_{\tau _{k}}(X_{j}))]\right] \\&\qquad \left. +\,\left[ (\hat{Q}_{\tau _{k}}(X_{j}|Y=y_{r})-Q_{\tau _{k}}(X_{j}|Y=y_{r})) +(Q_{\tau _{k}}(X_{j})-\hat{Q}_{\tau _{k}}(X_{j}))\right] ^2\right\} . \end{aligned}$$

By Condition (C1) and the Lemma 3 of Lo and Singh (1986), we have \(\sup _{\tau _{k}\in (0,1)}\big |\widehat{Q}_{\tau _{k}}(X_{j}|Y=y_{r})-Q_{\tau _{k}}(X_{j}|Y=y_{r}) \big |=O(n^{-1/2}(\log (n))^{1/2})\), \(\sup _{\tau _{k}\in (0,1)}\big |\widehat{Q}_{\tau _{k}}(X_{j})-Q_{\tau _{k}}(X_{j}) \big |=O(n^{-1/2}(\log (n))^{1/2})\). Taking n large enough and \(0<\alpha <\frac{1}{2}\), i.e., \(\frac{\log n}{n^{1-2\alpha }}\le c_1\varepsilon ^2\), \(c_1\) is some positive constant, which deduces \(\sum _{r=1}^{R_{n}}\hat{p}_{r}\Big [\Big (\widehat{Q}_{\tau _{k}}(X_{j}|Y=y_{r})-\widehat{Q}_{\tau _{k}}(X_{j})\Big )^{2}- \Big (Q_{\tau _{k}}(X_{j}|Y=y_{r})-Q_{\tau _{k}}(X_{j})\Big )^{2}\Big ]\le \frac{\varepsilon }{4}\), we have that

$$\begin{aligned} P \left\{ |\hat{\omega }_{j}-\tilde{\omega }_{j}|\ge \varepsilon /4 \right\}\le & {} \sum _{k=1}^{M} P\left\{ \Big |\sum _{r=1}^{R_{n}}(\hat{p}_{r}-p_{r})\Big |\ge \frac{\varepsilon }{4c_{1}}\right\} \\\le & {} \sum _{k=1}^{M} \sum _{r=1}^{R_{n}}P\left\{ \big |\hat{p}_{r}-p_{r}\big |\ge \frac{\varepsilon }{4R_{n}c_{1}}\right\} . \end{aligned}$$

Now, we define \(Z_{i,r}=I\{Y_{i}=y_{r}\}-p_{r}\). Then, for any fixed r, \(Z_{i,r}\) is independent for i with \(E(Z_{i,r})=0\) and \(|Z_{i,r}|\le 1\). Thus, noting that \(\hat{p}_{r}-p_{r}=\frac{1}{n}\sum _{i=1}^n Z_{i,r}\), by Hoeffding’s Inequality,

$$\begin{aligned} P\left( \big |\hat{p}_{r}-p_{r}\big |>\varepsilon \right) =P\left( \Big |\frac{1}{n}\sum _{k=1}^{n}Z_{i,r}\Big |>\varepsilon \right) \le 2\exp \{-2n\varepsilon ^2\}. \end{aligned}$$

Then, we can get

$$\begin{aligned} P \left\{ |\hat{\omega }_{j}-\tilde{\omega }_{j}|\ge \frac{\varepsilon }{4}\right\} \le 2 M R_{n}\exp \left\{ -\frac{n \varepsilon ^2}{8c_{1}^2R^2_{n}}\right\} . \end{aligned}$$

Take \(M=O(n^{\beta })\), \(R_{n}=O(n^{\alpha })\), for \(0\le \kappa <{\frac{1}{2}-\alpha }\), \(0<\alpha <\frac{1}{2}\), we have

$$\begin{aligned} P\left( \max \limits _{1\le {j}\le {p}}|\hat{\omega }_{j}-\omega _{j}|\ge {cn^{-\kappa }}\right) \le {O\left( p(n^{\beta +\alpha })\exp \left\{ -cn^{1-2\alpha -2\kappa }\right\} \right) }. \end{aligned}$$

Next, we deal with the second part of Theorem 1. If \(\mathcal {A}\nsubseteq \mathcal {\hat{A}}\), then there must exist some \(j\in \mathcal {A}\) such that \(\hat{\omega }_j<cn^{-\kappa }\). It follows from Condition (C2) that \(|\hat{\omega }_j-\omega _j|>{cn^{-\kappa }}\), for some \(j\in \mathcal {A}\). This indicates that the event satisfies \(\{\mathcal {A}\nsubseteq \mathcal {\hat{A}}\}\subseteq \{|\hat{\omega }_j-\omega _j|>{cn^{-\kappa }}, \text{ for } \text{ some } \quad j\in \mathcal {A}\)}, Hence, \(\{\max \limits _{j\in \mathcal {A}}|\hat{\omega }_j-\omega _j|\le {cn^{-\kappa }}\}\subseteq \{\mathcal {A}\subseteq \mathcal {\hat{A}}\}\). Consequently, for \(0\le \alpha <{\frac{1}{2}-\kappa }\), \(0<\alpha <\frac{1}{2}\),

$$\begin{aligned} P\left( \mathcal {A}\subseteq \mathcal {\hat{A}}\right)\ge & {} P\left( \max \limits _{j\in \mathcal {A}}|\hat{\omega }_j-\omega _j|\le {cn^{-\kappa }}\right) =1-P\left( \max \limits _{j\in \mathcal {A}}|\hat{\omega }_j-\omega _j|>cn^{-\kappa }\right) \\\ge & {} 1-s_nP\left( |\hat{\omega }_j-\omega _j|>cn^{-\kappa }\right) \\\ge & {} 1-O\left( s_{n}(n^{\beta +\alpha })\exp \left\{ -cn^{1-2\alpha -2\kappa }\right\} \right) , \end{aligned}$$

where \(s_n\) is the cardinality of \(\mathcal {A}\). This completes the proof of the Theorem 1. \(\square \)

Proof of Theorem 2

If \(\delta =\min \limits _{j\in \mathcal {A}}\omega _j-\max \limits _{j\in \mathcal {I}}\omega _{j}>0\), we can get

$$\begin{aligned} P\left( \min \limits _{j\in \mathcal {A}}\hat{\omega }_j>\max \limits _{j\in \mathcal {I}}\hat{\omega }_{j}\right)= & {} 1-P\left( \min \limits _{j\in \mathcal {A}}\hat{\omega }_j\le \max \limits _{j\in \mathcal {I}}\hat{\omega }_{j}\right) \\= & {} 1-P\left( \min \limits _{j\in \mathcal {A}}\hat{\omega }_j-\min \limits _{j\in \mathcal {A}}\omega _j+\delta \le \max \limits _{j\in \mathcal {I}}\hat{\omega }_{j}-\max \limits _{j\in \mathcal {I}}\omega _{j}\right) \\\ge & {} 1-P\left( \max \limits _{j\in \mathcal {I}}|\hat{\omega }_{j}-\omega _{j}|\ge \frac{\delta }{2}\right) -P\left( \max \limits _{j\in \mathcal {A}}|\hat{\omega }_{j}-\omega _{j}|\ge \frac{\delta }{2}\right) \\\ge & {} 1-O\left( p(n^{\beta +\alpha })\exp \left\{ -c\delta ^{2}n^{1-2\alpha }\right\} \right) . \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, F., Lai, P. & Shen, B. Robust composite weighted quantile screening for ultrahigh dimensional discriminant analysis. Metrika 83, 799–820 (2020). https://doi.org/10.1007/s00184-019-00758-x

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-019-00758-x

Keywords

Navigation