Variable selection with false discovery rate control in deep neural networks

Song, Zixuan; Li, Jun

doi:10.1038/s42256-021-00308-z

Article
Published: 29 March 2021

Variable selection with false discovery rate control in deep neural networks

Nature Machine Intelligence volume 3, pages 426–433 (2021)Cite this article

1975 Accesses
7 Citations
16 Altmetric
Metrics details

Subjects

A preprint version of the article is available at arXiv.

Abstract

Deep neural networks are famous for their high prediction accuracy, but they are also known for their black-box nature and poor interpretability. We consider the problem of variable selection, that is, selecting the input variables that have significant predictive power on the output, in deep neural networks. Most existing variable selection methods for neural networks are only applicable to shallow networks or are computationally infeasible on large datasets; moreover, they lack a control on the quality of selected variables. Here we propose a backward elimination procedure called SurvNet, which is based on a new measure of variable importance that applies to a wide variety of networks. More importantly, SurvNet is able to estimate and control the false discovery rate of selected variables empirically. Further, SurvNet adaptively determines how many variables to eliminate at each step in order to maximize the selection efficiency. The validity and efficiency of SurvNet are shown on various simulated and real datasets, and its performance is compared with other methods. Especially, a systematic comparison with knockoff-based methods shows that although they have more rigorous false discovery rate control on data with strong variable correlation, SurvNet usually has higher power.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Variable selection process on dataset 1.**

**Fig. 3: Results of SurvNet on the two real datasets.**

The impact of Bayesian optimization on feature selection

Article Open access 17 February 2024

Kaixin Yang, Long Liu & Yalu Wen

Algorithms to estimate Shapley value feature attributions

Article 22 May 2023

Hugh Chen, Ian C. Covert, … Su-In Lee

Towards algorithmic analytics for large-scale datasets

Article 09 July 2019

Danilo Bzdok, Thomas E. Nichols & Stephen M. Smith

Data availability

The simulated data (datasets 1–4) were generated using the code at https://github.com/zixuans/SurvNet/tree/master/Data. The MNIST data (dataset 5) is available at http://yann.lecun.com/exdb/mnist/. The single-cell RNA-Seq data (dataset 6) is available at the GEO repository https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE87544. The synthetic data used in the NeurIPS paper¹⁶ were simulated using the code on https://github.com/zixuans/SurvNet/tree/master/Comparisons%20with%20knockoffs/Scenario%203, and the real datasets were provided by request from its author, Y. Lu. The synthetic data used in the AISTATS paper¹⁷ were simulated using the code at https://github.com/zixuans/SurvNet/tree/master/Comparisons%20with%20knockoffs/Scenario%204, and the two real datasets are available at https://archive.ics.uci.edu/ml/datasets/Bank+Marketing and https://archive.ics.uci.edu/ml/datasets/Polish+companies+bankruptcy+data.

Code availability

The code developed for the study of SurvNet is publicly available at the Github repository https://github.com/zixuans/SurvNet. The code for GL and SGL¹³ is publicly available at https://bitbucket.org/ispamm/group-lasso-deep-networks/src/master/. The code used to construct second-order knockoffs¹⁹ and deep knockoffs²⁶ is available at https://github.com/msesia/knockoff-filter and https://github.com/msesia/deepknockoffs, respectively. The code of the algorithm proposed in the AISTATS paper¹⁷ is publicly available at https://github.com/jroquerogimenez/ConditionallySalientFeatures.

References

May, R., Dandy, G. & Maier, H. Review of input variable selection methods for artificial neural networks. Artif. Neural Networks 10, 16004 (2011).
Google Scholar
Guyon, I. & Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003).
MATH Google Scholar
Chandrashekar, G. & Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 40, 16–28 (2014).
Article Google Scholar
Battiti, R. Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Networks 5, 537–550 (1994).
Article Google Scholar
May, R. J., Maier, H. R., Dandy, G. C. & Fernando, T. G. Non-linear variable selection for artificial neural networks using partial mutual information. Environ. Model. Software 23, 1312–1326 (2008).
Article Google Scholar
Maier, H. R., Dandy, G. C. & Burch, M. D. Use of artificial neural networks for modelling cyanobacteria Anabaena spp. in the River Murray, South Australia. Ecol. Model. 105, 257–272 (1998).
Article Google Scholar
Brill, F. Z., Brown, D. E. & Martin, W. N. Fast generic selection of features for neural network classifiers. IEEE Trans. Neural Networks 3, 324–328 (1992).
Article Google Scholar
Tong, D. L. & Mintram, R. Genetic Algorithm-Neural Network (GANN): a study of neural network activation functions and depth of genetic algorithm search applied to feature selection. Int. J. Mach. Learn. Cybern. 1, 75–87 (2010).
Article Google Scholar
Sivagaminathan, R. K. & Ramakrishnan, S. A hybrid approach for feature subset selection using neural networks and ant colony optimization. Expert Syst. Appl. 33, 49–60 (2007).
Article Google Scholar
Grandvalet, Y. & Canu, S. Outcomes of the equivalence of adaptive ridge with least absolute shrinkage. In Advances in Neural Information Processing Systems 445–451 (1999).
Chapados, N. & Bengio, Y. Input decay: simple and effective soft variable selection. In IJCNN’01. International Joint Conference on Neural Networks Vol. 2, 1233–1237 (IEEE, 2001).
Similä, T. & Tikka, J. Combined input variable selection and model complexity control for nonlinear regression. Pattern Recognit. Lett. 30, 231–236 (2009).
Article Google Scholar
Scardapane, S., Comminiello, D., Hussain, A. & Uncini, A. Group sparse regularization for deep neural networks. Neurocomputing 241, 81–89 (2017).
Article Google Scholar
Zhang, G. P. Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybernet. C 30, 451–462 (2000).
Article Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
MathSciNet MATH Google Scholar
Lu, Y., Fan, Y., Lv, J. & Noble, W. S. Deeppink: reproducible feature selection in deep neural networks. In Advances in Neural Information Processing Systems 8676–8686 (2018).
Gimenez, J. R., Ghorbani, A. & Zou, J. Knockoffs for the mass: new feature importance statistics with false discovery guarantees. In 22nd International Conference on Artificial Intelligence and Statistics 2125–2133 (2019).
Barber, R. F. & Candès, E. J. Controlling the false discovery rate via knockoffs. Ann. Stat. 43, 2055–2085 (2015).
Article MathSciNet Google Scholar
Candès, E., Fan, Y., Janson, L. & Lv, J. Panning for gold: ‘model-X’ knockoffs for high dimensional controlled variable selection. J. R. Stat. Soc. B 80, 551–577 (2018).
Article MathSciNet Google Scholar
Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).
Article MathSciNet Google Scholar
Benjamini, Y. & Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29, 1165–1188 (2001).
Heesen, P. et al. Inequalities for the false discovery rate (FDR) under dependence. Electron. J. Stat. 9, 679–716 (2015).
Article MathSciNet Google Scholar
LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
Article Google Scholar
Kolodziejczyk, A. A., Kim, J. K., Svensson, V., Marioni, J. C. & Teichmann, S. A. The technology and biology of single-cell RNA sequencing. Mol. Cell 58, 610–620 (2015).
Article Google Scholar
Chen, R., Wu, X., Jiang, L. & Zhang, Y. Single-cell RNA-Seq reveals hypothalamic cell diversity. Cell Rep. 18, 3227–3241 (2017).
Article Google Scholar
Romano, Y., Sesia, M. & Candès, E. Deep knockoffs. J. Am. Stat. Assoc. 115, 1861–1872 (2019).
Tetko, I. V., Villa, A. E. & Livingstone, D. J. Neural network studies. 2. Variable selection. J. Chem. Inf. Comput. Sci. 36, 794–803 (1996).
Article Google Scholar
Steppe, J. & Bauer, K. Jr Feature saliency measures. Comput. Math. Appl. 33, 109–126 (1997).
Sen, T. K., Oliver, R. & Sen, N. in Neural networks in the Capital Markets 325–340 (Wiley, 1995).
Yacoub, M. & Bennani, Y. HVS: A heuristic for variable selection in multilayer artificial neural network classifier. In Intelligent Engineering Systems Through Artificial Neural Networks, St. Louis, Missouri Vol. 7, 527–532 (1997).
Garson, D. G. Interpreting neural network connection weights. AI Expert 6, 47–51 (1991).
Google Scholar
Nath, R., Rajagopalan, B. & Ryker, R. Determining the saliency of input variables in neural network classifiers. Comput. Oper. Res. 24, 767–773 (1997).
Article Google Scholar
Gevrey, M., Dimopoulos, I. & Lek, S. Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecol. Model. 160, 249–264 (2003).
Article Google Scholar
Mozer, M. C. & Smolensky, P. Skeletonization: a technique for trimming the fat from a network via relevance assessment. In Advances in Neural Information Processing Systems 107–115 (1989).
Karnin, E. D. A simple procedure for pruning back-propagation trained neural networks. IEEE Trans. Neural Networks 1, 239–242 (1990).
Article Google Scholar
LeCun, Y., Denker, J. S. & Solla, S. A. Optimal brain damage. In Advances in Neural Information Processing Systems 598–605 (1990).
Cibas, T., Soulié, F. F., Gallinari, P. & Raudys, S. Variable selection with optimal cell damage. In International Conference on Artificial Neural Networks 727–730 (Springer, 1994).
Hassibi, B. & Stork, D. G. Second order derivatives for network pruning: optimal brain surgeon. In Advances in Neural Information Processing Systems 164–171 (1993).
Dimopoulos, Y., Bourret, P. & Lek, S. Use of some sensitivity criteria for choosing networks with good generalization ability. Neural Process. Lett. 2, 1–4 (1995).
Article Google Scholar
Dimopoulos, I., Chronopoulos, J., Chronopoulou-Sereli, A. & Lek, S. Neural network models to study relationships between lead concentration in grasses and permanent urban descriptors in Athens city (Greece). Ecol. Model. 120, 157–165 (1999).
Article Google Scholar
Ruck, D. W., Rogers, S. K. & Kabrisky, M. Feature selection using a multilayer perceptron. J. Neural Network Comput. 2, 40–48 (1990).
Google Scholar
Bishop, C. M. et al. Neural Networks for Pattern Recognition (Oxford Univ. Press, 1995).
LeCun, Y. A., Bottou, L., Orr, G. B. & Müller, K.-R. in Neural Networks: Tricks of the Trade 9–48 (Springer, 2012).
Abadi, M. et al. TensorFlow: Large-scale machine learning on heterogeneous systems (2015); https://www.tensorflow.org/

Download references

Acknowledgements

This work was supported by the National Institutes of Health (R01GM120733 to J.L.), the American Cancer Society (RSG-17-206-01-TBG to J.L.) and the National Science Foundation (1925645 to J.L.).

Author information

Authors and Affiliations

Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN, USA
Zixuan Song & Jun Li

Authors

Zixuan Song
View author publications
You can also search for this author in PubMed Google Scholar
Jun Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.L. conceived and supervised the study. J.L. and Z.S. proposed the methods. Z.S. implemented the methods and constructed the data analysis. Z.S. drafted the manuscript and J.L. substantively revised it.

Corresponding author

Correspondence to Jun Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary methods, results, discussions, Figs. 1 and 2, and Tables 1–11.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, Z., Li, J. Variable selection with false discovery rate control in deep neural networks. Nat Mach Intell 3, 426–433 (2021). https://doi.org/10.1038/s42256-021-00308-z

Download citation

Received: 25 July 2019
Accepted: 26 January 2021
Published: 29 March 2021
Issue Date: May 2021
DOI: https://doi.org/10.1038/s42256-021-00308-z

This article is cited by

scFSNN: a feature selection method based on neural network for single-cell RNA-seq data
- Minjiao Peng
- Baoqin Lin
- Bingqing Lin
BMC Genomics (2024)
Predictor Selection for CNN-based Statistical Downscaling of Monthly Precipitation
- Dangfu Yang
- Shengjun Liu
- Liang Zhao
Advances in Atmospheric Sciences (2023)
Deep neural networks with controlled variable selection for the identification of putative causal genetic variants
- Peyman H. Kassani
- Fred Lu
- Zihuai He
Nature Machine Intelligence (2022)