Feature Selection for Machine Learning Algorithms that Bounds False Positive Rate,arXiv - STAT - Methodology

当前位置： X-MOL 学术 › arXiv.stat.ME › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Feature Selection for Machine Learning Algorithms that Bounds False Positive Rate
arXiv - STAT - Methodology Pub Date : 2022-08-05 , DOI: arxiv-2208.02948
Mehdi Rostami, Olli Saarela

The problem of selecting a handful of truly relevant variables in supervised machine learning algorithms is a challenging problem in terms of untestable assumptions that must hold and unavailability of theoretical assurances that selection errors are under control. We propose a distribution-free feature selection method, referred to as Data Splitting Selection (DSS) which controls False Discovery Rate (FDR) of feature selection while obtaining a high power. Another version of DSS is proposed with a higher power which "almost" controls FDR. No assumptions are made on the distribution of the response or on the joint distribution of the features. Extensive simulation is performed to compare the performance of the proposed methods with the existing ones.

中文翻译：

限制假阳性率的机器学习算法的特征选择

在监督机器学习算法中选择少数真正相关的变量的问题是一个具有挑战性的问题，因为它必须持有不可测试的假设，并且无法获得选择错误得到控制的理论保证。我们提出了一种无分布的特征选择方法，称为数据拆分选择（DSS），它控制特征选择的错误发现率（FDR），同时获得高功率。另一个版本的 DSS 被提出，它具有“几乎”控制 FDR 的更高功率。不对响应的分布或特征的联合分布做出任何假设。进行了广泛的模拟以比较所提出方法与现有方法的性能。

更新日期：2022-08-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文