当前位置: X-MOL 学术J. Comput. Graph. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Holdout Randomization Test for Feature Selection in Black Box Models
Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2021-07-26 , DOI: 10.1080/10618600.2021.1923520
Wesley Tansey 1 , Victor Veitch 2 , Haoran Zhang 3 , Raul Rabadan 4 , David M. Blei 5
Affiliation  

Abstract

We propose the holdout randomization test (HRT), an approach to feature selection using black box predictive models. The HRT is a specialized version of the conditional randomization test (CRT) that uses data splitting for feasible computation. The HRT works with any predictive model and produces a valid p-value for each feature. To make the HRT more practical, we propose a set of extensions to maximize power and speed up computation. In simulations, these extensions lead to greater power than a competing knockoffs-based approach, without sacrificing control of the error rate. We apply the HRT to two case studies from the scientific literature where heuristics were originally used to select important features for predictive models. The results illustrate how such heuristics can be misleading relative to principled methods like the HRT. Code is available at https://github.com/tansey/hrt. Supplementary materials for this article are available online.



中文翻译:

黑盒模型中特征选择的保持随机化测试

摘要

我们提出了保持随机化测试 (HRT),这是一种使用黑盒预测模型进行特征选择的方法。HRT 是条件随机化测试 (CRT) 的专用版本,它使用数据拆分来进行可行的计算。HRT 适用于任何预测模型并产生有效的p- 每个特征的值。为了使 HRT 更实用,我们提出了一组扩展来最大化功率和加速计算。在模拟中,与基于仿冒品的竞争方法相比,这些扩展产生了更大的功率,而不会牺牲对错误率的控制。我们将 HRT 应用于科学文献中的两个案例研究,其中启发式最初用于选择预测模型的重要特征。结果表明,相对于 HRT 等原则性方法,这种启发式方法可能会产生误导。代码可在 https://github.com/tansey/hrt 获得。本文的补充材料可在线获取。

更新日期:2021-07-26
down
wechat
bug