February 2022 Powerful knockoffs via minimizing reconstructability
Asher Spector, Lucas Janson
Author Affiliations +
Ann. Statist. 50(1): 252-276 (February 2022). DOI: 10.1214/21-AOS2104

Abstract

Model-X knockoffs (J. R. Stat. Soc. Ser. B. Stat. Methodol. 80 (2018) 551–577) allows analysts to perform feature selection using almost any machine learning algorithm while provably controlling the expected proportion of false discoveries. This procedure involves constructing synthetic variables, called knockoffs, which effectively act as controls during feature selection. The gold standard for constructing knockoffs has been to minimize the mean absolute correlation (MAC) between features and their knockoffs, but, surprisingly, we prove this procedure can be powerless in extremely easy settings, including Gaussian linear models with correlated exchangeable features. The key problem is that minimizing the MAC creates joint dependencies between the features and knockoffs, which allow machine learning algorithms to reconstruct the effect of the features on the response using the knockoffs. To improve power, we propose generating knockoffs which minimize the reconstructability (MRC) of the features, and we demonstrate our proposal for Gaussian features by showing it is computationally efficient, robust, and powerful. We also prove that certain MRC knockoffs minimize a notion of estimation error in Gaussian linear models. Through extensive simulations, we show MRC knockoffs often dramatically outperform MAC-minimizing knockoffs, and we find no settings in which MAC-minimizing knockoffs outperform MRC knockoffs by more than a slight margin. We implement our methods and many others from the knockoffs literature in a new python package knockpy.

Funding Statement

L. J. was partially supported by the William F. Milton Fund.

Acknowledgments

The authors would like to thank Chenguang Dai, Buyu Lin, Jun Liu, Wenshuo Wang, and Xin Xing for valuable discussions and suggestions. The authors are also grateful to the anonymous referees for helpful comments.

Citation

Download Citation

Asher Spector. Lucas Janson. "Powerful knockoffs via minimizing reconstructability." Ann. Statist. 50 (1) 252 - 276, February 2022. https://doi.org/10.1214/21-AOS2104

Information

Received: 1 December 2020; Revised: 1 May 2021; Published: February 2022
First available in Project Euclid: 16 February 2022

MathSciNet: MR4382016
zbMATH: 1486.62130
Digital Object Identifier: 10.1214/21-AOS2104

Subjects:
Primary: 62G10
Secondary: 62J12

Keywords: false discovery rate (FDR) , high-dimensional inference , Knockoffs , model-X , power , Variable selection

Rights: Copyright © 2022 Institute of Mathematical Statistics

JOURNAL ARTICLE
25 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.50 • No. 1 • February 2022
Back to Top