An empirical threshold of selection probability for analysis of high-dimensional correlated data,Journal of Statistical Computation and Simulation

当前位置： X-MOL 学术 › J. Stat. Comput. Simul. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An empirical threshold of selection probability for analysis of high-dimensional correlated data
Journal of Statistical Computation and Simulation ( IF 1.2 ) Pub Date : 2020-03-11 , DOI: 10.1080/00949655.2020.1739286
Kipoong Kim ₁ , Jajoon Koo ₁ , Hokeun Sun ₁

Affiliation

For the analysis of high-dimensional data, regularization methods based on penalized likelihood have been extensively studied over the last few decades. But, they commonly require the optimal choice of tuning parameters to select relevant variables. Although cross-validation has been popularly used for tuning parameter selection, its selection result is not often stable due to random split of samples. As an alternative to cross-validation, computation of selection probability has been proposed for stable variable selection. Ranking of individual variables can be determined based on their selection probability, regardless of tuning parameter values. However, a theoretical threshold of selection probability fails to control the number of false discoveries when it applies to high-dimensional correlated data. In this article, we propose new strategy to compute an empirical threshold of selection probability. Selection performance of the proposed threshold is evaluated through extensive simulation studies and high-dimensional genomic data analysis.

中文翻译：

用于分析高维相关数据的选择概率的经验阈值

对于高维数据的分析，基于惩罚似然的正则化方法在过去几十年中得到了广泛的研究。但是，它们通常需要优化选择调整参数来选择相关变量。尽管交叉验证已广泛用于调整参数选择，但由于样本的随机拆分，其选择结果往往不稳定。作为交叉验证的替代方案，已提出计算选择概率以用于稳定变量选择。无论调整参数值如何，单个变量的排名都可以根据它们的选择概率来确定。然而，当应用于高维相关数据时，选择概率的理论阈值无法控制错误发现的数量。在本文中，我们提出了新的策略来计算选择概率的经验阈值。通过广泛的模拟研究和高维基因组数据分析来评估所提出阈值的选择性能。

更新日期：2020-03-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>